Re: Riak replication and quorum

2011-05-26 Thread Mathias Meyer
Peter,

wrote my replies inline.

Mathias Meyer
Developer Advocate, Basho Technologies


On Freitag, 13. Mai 2011 at 20:05, Peter Fales wrote:

> Sean,
> 
> Thanks to you and Ben for clarifying how that works. Since that was 
> so helpful, I'll ask a followup question, and also a question on 
> a mostly un-related topic...
> 
> 1) When I've removed a couple of nodes and the remaining nodes pick up 
> the slack, is there any way for me to look under the hood and see that?
> I'm using wget to fetch the '.../stats' URL from one of the remaing 
> live nodes, and under ring_ownership it still lists the original 4
> nodes, each one owning 1/4 or the total partitions. That's part of
> reason why I didn't think the data ownership had been moved.
> 
Ring ownership is only affected by nodes explicitly entering and leaving the 
cluster. Unless you explicitly tell the cluster to remove a node, or explicitly 
tell that node to leave the cluster, ownership will remain the same even in 
case of a failure on one or more nodes. Data ownership is moved around 
implicitly in case of failure. By looking at the preference list, the 
coordinating node simply picks the next node(s) to pick up the slack for the 
failed one(s).

The only way to find out if a handoff is currently happening between any two 
nodes is to look at the logs. They'll indicate beginning and end of a transfer. 
The cluster state and therefore the stats don't take re-partitioning or handoff 
into account yet.
> 2) My test involves sending a large number of read/write requests to the 
> cluster from multiple client connections and timing how long each request
> takes. I find that the vast majority of the requests are processed 
> quickly (a few milliseconds to 10s of milliseconds). However, every once
> in while, the server seems to "hang" for a while. When that happens
> the response can take several hundred milliseconds or even several 
> seconds. Is this something that is known and/or expected? There 
> doesn't seem to be any pattern to how often it happens -- typically 
> I'll see it a "few" times during a 10-minute test run. Sometimes
> it will go for several minutes without a problem. I haven't ruled
> out a problem with my test client, but it's fairly simple-minded C++
> program using the protocol buffers interface, so I don't think there
> is too much that can go wrong on that end.
> 
Easiest to find out if the problem is something stalling is to look at the 
stats and the percentiles for put and get fsms, which are responsible for 
taking care of reads and writes. Look for the JSON keys node_get_fsm_time_* and 
node_put_fsm_time_*. If anything jumps out here during and shortly after your 
benchmark run, something on the Riak or EC2 end is probably waiting for 
something else.

Are you using EBS in any way for storing Riak's data? If so, what kind of setup 
do you have, single volume or RAID? 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Clarification on key filters

2011-05-26 Thread Mathias Meyer
Jeremiah,

I sure hope you have not been drinking mouthwas.

The wiki is indeed showing single key filters, and that's confusing. In 
hindsight, it confused me too when I worked with key filters from the wiki page 
for the first time. I'll make sure we put some proper examples on that page to 
clarify how they should end up looking when multiple are put together. 

The bottom line is that Ripple does produce proper key filter code with 
conditions and that you are absolutely correct in bringing up this slight 
confusion.

Mathias Meyer
Developer Advocate, Basho Technologies

On Donnerstag, 26. Mai 2011 at 03:34, Jeremiah Peschka wrote:

> An MR job has 0 or more key filters. Adding a few transforms generates a 
> lovely list of lists:
> { "inputs":{ "bucket":"invoices" "key_filters":[["tokenize", "-", 1], 
> ["to_lower"], ["matches", "solutions"]] }, // ... }
> That makes sense: we've got a list of key filters. And a key filter is, in 
> effect, a list of arguments.
> 
> The complex predicates are throwing me off. Looking at the spec from Ripple 
> in filter_builder_spec.rb [1] and it's showing that 
> 
> subject.OR do
> starts_with "foo"
> ends_with "bar"
> end
> 
> 
> becomes
> 
> [[:or, [[:starts_with, "foo"],[:ends_with, "bar"
> 
> Which is not at all the same as what the wiki says an OR should look like: 
> 
> ["or", [["eq", "google"]], [["less_than", "g"]]]
> 
> Apart from the obvious difference in syntax, have I been drinking mouthwash 
> or did the wiki suddenly switch from showing a complete key filter condition 
> to showing an individual key filter command for AND, OR, and NOT? 
> 
> Jeremiah
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Old values on put call

2011-05-26 Thread Mathias Meyer
Anthony,

currently there's no simple way to achieve what you're after. The API is 
modelled around a GET before you PUT model.

You can specify the option return_body when putting new data into Riak, like so:

{ok, RObj} = riak_pb_socket:put (.., NewObject, [return_body]),

That returns the freshly written data in the result. It's not exactly what you 
want, but could be used to achieve what you're after by deliberately creating 
siblings, by e.g. leaving out the vector clock. Specifying the return_body flag 
then will return the list of siblings created with this requests.

Be aware that abusing this technique with too many writes and not properly 
reconciling siblings may cause your Riak objects to grow in an unhealthy 
manner, and it's not exactly a recommended way of doing things.

It's an option but certainly not as simple as the one you're after, unless 
you're prepared to deal with the potential conflicts, and e.g. handle siblings 
immediately after you reconciled the differences between two objects in your 
compare() function, see [1] for more details.

Mathias Meyer
Developer Advocate, Basho Technologies

[1] http://wiki.basho.com/Vector-Clocks.html

On Mittwoch, 18. Mai 2011 at 23:28, Anthony Molinaro wrote:

> Hi,
> 
>  I'm working on an application where I am storing an erlang external term
> format in riak. The entire structure gets updated at once, but I'd like to
> see what is changing, so I have something like this (in pseudo code).
> 
> NewObject = construct_new(...),
> OldObject = riak_pb_socket:get(...),
> ok = riak_pb_socket:put (.., NewObject),
> compare (OldObject, NewObject),
> 
> The idea being that I am updating the object everytime, but I would like
> to have a general idea what has changed.
> 
> So I was wondering if there are any options for put to return the previous
> value? That would allow me to remove the call to get and simply do something
> like.
> 
> NewObject = construct_new(...),
> OldObject = riak_pb_socket:put (.., NewObject),
> compare (OldObject, NewObject),
> 
> Now I assume what I get back would depend a lot on what the w value is, but
> in most cases, I tend to use the defaults. Also, I would think the old
> values could be returned as a list in some cases where there was disagreement.
> 
> Anyway, would something like this be hard to implement in riak itself
> (it's sort of a specialized use case, but I could see it being useful
> in cases like mine where you always want a put to succeed, but you might
> want to check what changed for tracking reasons, and I do understand that
> you won't be absolutely accurate all the time, but I mostly am looking for
> something scalable and mostly accurate).
> 
> -Anthony
> 
> -- 
> 
> Anthony Molinaro  (mailto:antho...@alumni.caltech.edu)>
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Mathias Meyer
Greg,

Riak Search stores indexed documents in Riak KV too, as serialized Erlang 
terms. You can easily verify that by requesting a document from 
http://riak.host:8098/riak/_rsid_/key.

So whenever you query something through the Solr interface the documents you 
get back are fetched from these buckets, and therefore the same distribution 
and consistency properties apply to them as to objects stored directly in Riak 
KV. Bottom line is there's nothing wrong with just using them instead of 
fetching them again from Riak KV.

Mathias Meyer
Developer Advocate, Basho Technologies


On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:

> Hi,
> 
> In our data model, our riak objects are flat JSON objects, and thus their 
> corresponding index documents are nearly identical - the only difference is 
> that a few fields which are ints in the riak objects are strings in the index 
> doc. 
> 
> Since they are so similar, we are directly using the index docs returned from 
> our search call, skipping the second step of doing gets on the returned keys 
> to retrieve the real objects.
> 
> Is this advisable? Are there any circumstances under which we might run into 
> consistency issues?
> 
> Thanks,
> -Greg
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Clarification on key filters

2011-05-26 Thread Jeremiah Peschka
Thanks for the clarification. I'll use Ripple to generate some 'canonical' 
strings to work with while I do other, more nefarious, things with them.

Much appreciated. 
--- 
Jeremiah Peschka
Founder, Brent Ozar PLF


On Thursday, May 26, 2011 at 2:42 AM, Mathias Meyer wrote:

> Jeremiah,
> 
> I sure hope you have not been drinking mouthwas.
> 
> The wiki is indeed showing single key filters, and that's confusing. In 
> hindsight, it confused me too when I worked with key filters from the wiki 
> page for the first time. I'll make sure we put some proper examples on that 
> page to clarify how they should end up looking when multiple are put 
> together. 
> 
> The bottom line is that Ripple does produce proper key filter code with 
> conditions and that you are absolutely correct in bringing up this slight 
> confusion.
> 
> Mathias Meyer
> Developer Advocate, Basho Technologies
> 
> On Donnerstag, 26. Mai 2011 at 03:34, Jeremiah Peschka wrote:
> 
> > An MR job has 0 or more key filters. Adding a few transforms generates a 
> > lovely list of lists:
> > { "inputs":{ "bucket":"invoices" "key_filters":[["tokenize", "-", 1], 
> > ["to_lower"], ["matches", "solutions"]] }, // ... }
> > That makes sense: we've got a list of key filters. And a key filter is, in 
> > effect, a list of arguments.
> > 
> > The complex predicates are throwing me off. Looking at the spec from Ripple 
> > in filter_builder_spec.rb [1] and it's showing that 
> > 
> > subject.OR do
> > starts_with "foo"
> > ends_with "bar"
> > end
> > 
> > 
> > becomes
> > 
> > [[:or, [[:starts_with, "foo"],[:ends_with, "bar"
> > 
> > Which is not at all the same as what the wiki says an OR should look like: 
> > 
> > ["or", [["eq", "google"]], [["less_than", "g"]]]
> > 
> > Apart from the obvious difference in syntax, have I been drinking mouthwash 
> > or did the wiki suddenly switch from showing a complete key filter 
> > condition to showing an individual key filter command for AND, OR, and NOT? 
> > 
> > Jeremiah
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak doesn't use consistent hashing

2011-05-26 Thread Jonathan Langevin
That sounds quite disconcerting. What happens to the performance of the
cluster when this occurs?*


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:

>  I've been doing some digging through the details of how a node joins a
> cluster.  When you hear that Riak uses consistent hashing, you'd expect it
> to distribute keys to nodes by hashing keys onto the ring AND hashing nodes
> onto the ring.  Keys belong to the closest node on the ring, in the
> clockwise direction.  Add a node, it hashes onto the ring and takes over
> some keys.  Ordinarily the node would hash onto the ring in several places,
> to achieve better spread.  Some data (roughly 1 / #nodes) moves to the new
> node from each of the other nodes, and everything else stays the same.
>
> In what Amazon describes as operationally simpler (strategy 3 in the Dynamo
> paper), the ring is instead divided into equally-sized partitions.  Nodes
> are hashed onto the ring, and preflists are calculated by walking clockwise
> from a partition, skipping partitions on already visited nodes.  Riak does
> something similar: it divides the ring into equally-sized partitions, then
> nodes "randomly" claim partitions.  However, the skipping bit isn't part of
> Riak's preflist calculation.  Instead, nodes claim partitions in such a way
> as to be spaced out by target_n_val, to obviate the need for skipping.
>
> Now, getting back to what happens when a node joins.  The new node
> calculates a new ring state that maintains the target_n_val invariant, as
> well as trying to keep even spread of partitions per node.  The algorithm
> (default_choose_claim) is heuristic and greedy in nature, and recursively
> transfers partitions to the new node until optimal spread is achieved,
> maintaining target_n_val along the way.  But if -- during one of those
> recursive calls -- it can't meet the target_n_val, it will throw up its
> hands and completely re-do the whole ring (by calling claim_rebalance_n).
>  Striping the partitions across nodes, in a round-robin fashion.  When that
> happens, most of the data needs to be handed off between nodes.
>
> This happens a lot, with many ring sizes.  With ring_creation_size=128
> (i.e., 128 partitions), it will happen when adding node 9 (87.5% of data
> moves), adding node 12 (82%), adding node 15 (80%), adding node 19 (94%).
>  It happens with all ring sizes >= 128 (256, 512, 1024, ...).  It appears
> that any ring_creation_size (64 by default) is safe for growing to 8 nodes
> or so.  But if you want to go beyond that...  A ring size of >= 128 with
> more than 8 nodes doesn't seem all that unusual, surely someone has hit this
> before?  I've filed a bug report here:
> https://issues.basho.com/show_bug.cgi?id=
>
> Anyway, this feels like a bit of a departure from consistent hashing.  In
> fact, could this not be replaced by normal hashing + a lookup table mapping
> intervals of the hash space to nodes?  And isn't that simply sharding?
>
> At any rate, I believe the claim algorithm can be improved to avoid those
> "throw up hands and stripe everything" scenarios.  In fact, here is such an
> implementation:  https://github.com/basho/riak_core/pull/55.  It is still
> heuristic and greedy, but it seems to do a better job of avoiding re-stripe.
>  Test results are attached in a zip on the bug linked above.  I'd love to
> get the riak_core gurus at Basho to look at this and help validate it.  It
> probably could use some cleaning up, but I want to make sure there aren't
> other invariants or considerations I'm leaving out -- besides maintaining
> target_n_val, keeping optimal partition spread, and minimizing handoff
> between ring states.
>
> -Greg
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak doesn't use consistent hashing

2011-05-26 Thread Ben Tilly
Performance is fine.  However requests get a "not found" response for an
extended period of time.  See
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/thread.html#4078for
previous discussion of what sounds like the same issue.

On Thu, May 26, 2011 at 6:57 AM, Jonathan Langevin <
jlange...@loomlearning.com> wrote:

> That sounds quite disconcerting. What happens to the performance of the
> cluster when this occurs?*
>
>  
>  Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
> On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:
>
>>  I've been doing some digging through the details of how a node joins a
>> cluster.  When you hear that Riak uses consistent hashing, you'd expect it
>> to distribute keys to nodes by hashing keys onto the ring AND hashing nodes
>> onto the ring.  Keys belong to the closest node on the ring, in the
>> clockwise direction.  Add a node, it hashes onto the ring and takes over
>> some keys.  Ordinarily the node would hash onto the ring in several places,
>> to achieve better spread.  Some data (roughly 1 / #nodes) moves to the new
>> node from each of the other nodes, and everything else stays the same.
>>
>> In what Amazon describes as operationally simpler (strategy 3 in the
>> Dynamo paper), the ring is instead divided into equally-sized partitions.
>>  Nodes are hashed onto the ring, and preflists are calculated by walking
>> clockwise from a partition, skipping partitions on already visited nodes.
>>  Riak does something similar: it divides the ring into equally-sized
>> partitions, then nodes "randomly" claim partitions.  However, the skipping
>> bit isn't part of Riak's preflist calculation.  Instead, nodes claim
>> partitions in such a way as to be spaced out by target_n_val, to obviate the
>> need for skipping.
>>
>> Now, getting back to what happens when a node joins.  The new node
>> calculates a new ring state that maintains the target_n_val invariant, as
>> well as trying to keep even spread of partitions per node.  The algorithm
>> (default_choose_claim) is heuristic and greedy in nature, and recursively
>> transfers partitions to the new node until optimal spread is achieved,
>> maintaining target_n_val along the way.  But if -- during one of those
>> recursive calls -- it can't meet the target_n_val, it will throw up its
>> hands and completely re-do the whole ring (by calling claim_rebalance_n).
>>  Striping the partitions across nodes, in a round-robin fashion.  When that
>> happens, most of the data needs to be handed off between nodes.
>>
>> This happens a lot, with many ring sizes.  With ring_creation_size=128
>> (i.e., 128 partitions), it will happen when adding node 9 (87.5% of data
>> moves), adding node 12 (82%), adding node 15 (80%), adding node 19 (94%).
>>  It happens with all ring sizes >= 128 (256, 512, 1024, ...).  It appears
>> that any ring_creation_size (64 by default) is safe for growing to 8 nodes
>> or so.  But if you want to go beyond that...  A ring size of >= 128 with
>> more than 8 nodes doesn't seem all that unusual, surely someone has hit this
>> before?  I've filed a bug report here:
>> https://issues.basho.com/show_bug.cgi?id=
>>
>> Anyway, this feels like a bit of a departure from consistent hashing.  In
>> fact, could this not be replaced by normal hashing + a lookup table mapping
>> intervals of the hash space to nodes?  And isn't that simply sharding?
>>
>> At any rate, I believe the claim algorithm can be improved to avoid those
>> "throw up hands and stripe everything" scenarios.  In fact, here is such an
>> implementation:  https://github.com/basho/riak_core/pull/55.  It is still
>> heuristic and greedy, but it seems to do a better job of avoiding re-stripe.
>>  Test results are attached in a zip on the bug linked above.  I'd love to
>> get the riak_core gurus at Basho to look at this and help validate it.  It
>> probably could use some cleaning up, but I want to make sure there aren't
>> other invariants or considerations I'm leaving out -- besides maintaining
>> target_n_val, keeping optimal partition spread, and minimizing handoff
>> between ring states.
>>
>> -Greg
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Keith Bennett
All -

I just started working with Riak, and am using the riak-client Ruby gem.

When I delete a key from a bucket, and try to fetch the value associated with 
that key, I get a 404 error (which is reasonable).  However, it remains in the 
bucket's list of keys (i.e. the value returned by bucket.keys().  Why is the 
key still reported to exist in the bucket? Is bucket.keys cached, and therefore 
unaware of the deletion? Here's a riak-client Ruby script and its output in irb 
that illustrates this:

ree-1.8.7-2010.02 :001 > require 'riak'
 => true 
ree-1.8.7-2010.02 :002 > 
ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
 => #http://127.0.0.1:8098> 
ree-1.8.7-2010.02 :004 > bucket = client['links']
 => # 
ree-1.8.7-2010.02 :005 > key = bucket.keys.first
 => "4000-17.xml" 
ree-1.8.7-2010.02 :006 > object = bucket[key]
 => # 
ree-1.8.7-2010.02 :007 > object.delete
 => # 
ree-1.8.7-2010.02 :008 > bucket.keys.first
 => "4000-17.xml" 
ree-1.8.7-2010.02 :009 > object = bucket[key]
Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. not 
found

from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
 `perform'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in 
`request'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in 
`reading_body'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in 
`request'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in 
`request'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in 
`start'
from 
/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in 
`request'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
 `perform'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
 `tap'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
 `perform'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
 `get'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend.rb:72:in
 `fetch_object'
from 
/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/bucket.rb:101:in
 `[]'
from riak-delete-failure.rb:9

Thanks,
Keith



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Sean Cribbs
Keith,

There was a pull-request issue out for this on the Github project 
(https://github.com/seancribbs/ripple/pull/168). For various reasons, the list 
of keys is memoized in the Riak::Bucket instance.  Passing :reload => true to 
the #keys method will cause it to refresh.  I like to discourage list-keys, but 
with the memoized list you don't shoot yourself in the foot as often.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:29 AM, Keith Bennett wrote:

> All -
> 
> I just started working with Riak, and am using the riak-client Ruby gem.
> 
> When I delete a key from a bucket, and try to fetch the value associated with 
> that key, I get a 404 error (which is reasonable).  However, it remains in 
> the bucket's list of keys (i.e. the value returned by bucket.keys().  Why is 
> the key still reported to exist in the bucket? Is bucket.keys cached, and 
> therefore unaware of the deletion? Here's a riak-client Ruby script and its 
> output in irb that illustrates this:
> 
> ree-1.8.7-2010.02 :001 > require 'riak'
> => true 
> ree-1.8.7-2010.02 :002 > 
> ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
> => #http://127.0.0.1:8098> 
> ree-1.8.7-2010.02 :004 > bucket = client['links']
> => # 
> ree-1.8.7-2010.02 :005 > key = bucket.keys.first
> => "4000-17.xml" 
> ree-1.8.7-2010.02 :006 > object = bucket[key]
> => # 
> ree-1.8.7-2010.02 :007 > object.delete
> => # 
> ree-1.8.7-2010.02 :008 > bucket.keys.first
> => "4000-17.xml" 
> ree-1.8.7-2010.02 :009 > object = bucket[key]
> Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. not 
> found
> 
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
>  `perform'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
>  `request'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
>  `reading_body'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in
>  `request'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in
>  `request'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in 
> `start'
>   from 
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in
>  `request'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
>  `perform'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>  `tap'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>  `perform'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
>  `get'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend.rb:72:in
>  `fetch_object'
>   from 
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/bucket.rb:101:in
>  `[]'
>   from riak-delete-failure.rb:9
> 
> Thanks,
> Keith
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Jonathan Langevin
How long is the key list cached like that, naturally?*


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs  wrote:

> Keith,
>
> There was a pull-request issue out for this on the Github project (
> https://github.com/seancribbs/ripple/pull/168). For various reasons, the
> list of keys is memoized in the Riak::Bucket instance.  Passing :reload =>
> true to the #keys method will cause it to refresh.  I like to discourage
> list-keys, but with the memoized list you don't shoot yourself in the foot
> as often.
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>
> > All -
> >
> > I just started working with Riak, and am using the riak-client Ruby gem.
> >
> > When I delete a key from a bucket, and try to fetch the value associated
> with that key, I get a 404 error (which is reasonable).  However, it remains
> in the bucket's list of keys (i.e. the value returned by bucket.keys().  Why
> is the key still reported to exist in the bucket? Is bucket.keys cached, and
> therefore unaware of the deletion? Here's a riak-client Ruby script and its
> output in irb that illustrates this:
> >
> > ree-1.8.7-2010.02 :001 > require 'riak'
> > => true
> > ree-1.8.7-2010.02 :002 >
> > ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
> > => #http://127.0.0.1:8098>
> > ree-1.8.7-2010.02 :004 > bucket = client['links']
> > => #
> > ree-1.8.7-2010.02 :005 > key = bucket.keys.first
> > => "4000-17.xml"
> > ree-1.8.7-2010.02 :006 > object = bucket[key]
> > => #
> > ree-1.8.7-2010.02 :007 > object.delete
> > => #
> > ree-1.8.7-2010.02 :008 > bucket.keys.first
> > => "4000-17.xml"
> > ree-1.8.7-2010.02 :009 > object = bucket[key]
> > Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404.
> not found
> >
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
> `perform'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
> `request'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
> `reading_body'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in
> `request'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in
> `request'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in
> `start'
> >   from
> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in
> `request'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
> `perform'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
> `tap'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
> `perform'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
> `get'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend.rb:72:in
> `fetch_object'
> >   from
> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/bucket.rb:101:in
> `[]'
> >   from riak-delete-failure.rb:9
> >
> > Thanks,
> > Keith
> >
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak doesn't use consistent hashing.

2011-05-26 Thread Justin Sheehy
Hi, Greg.

Thanks for your thoughtful analysis and the pull request.

On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:

> However, the skipping bit isn't part of
> Riak's preflist calculation.  Instead, nodes claim partitions in such a way
> as to be spaced out by target_n_val, to obviate the need for skipping.

A fun bit of history here: once upon a time, Riak's claiming worked in
the same way as described by Amazon, with "skipping" and all.  We
noticed that this approach caused a different set of operational
difficulties when hinted handoff due to node outages was occurring at
the same time as a membership change.  That prompted changes to the
claim algorithm, which we still consider an area deserving of active
improvement.

Multiple people will be reading, analyzing, and testing your work to
contribute to this improvement.  We very much appreciate your efforts,
and want to make sure that we incorporate them in the best possible
way.

Thanks,

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Keith Bennett
Sean -

Thanks for responding so quickly.  I posted a response on github 
(https://github.com/seancribbs/ripple/pull/168).

Regards,
Keith


On May 26, 2011, at 10:35 AM, Sean Cribbs wrote:

> Keith,
> 
> There was a pull-request issue out for this on the Github project 
> (https://github.com/seancribbs/ripple/pull/168). For various reasons, the 
> list of keys is memoized in the Riak::Bucket instance.  Passing :reload => 
> true to the #keys method will cause it to refresh.  I like to discourage 
> list-keys, but with the memoized list you don't shoot yourself in the foot as 
> often.
> 
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
> 
>> All -
>> 
>> I just started working with Riak, and am using the riak-client Ruby gem.
>> 
>> When I delete a key from a bucket, and try to fetch the value associated 
>> with that key, I get a 404 error (which is reasonable).  However, it remains 
>> in the bucket's list of keys (i.e. the value returned by bucket.keys().  Why 
>> is the key still reported to exist in the bucket? Is bucket.keys cached, and 
>> therefore unaware of the deletion? Here's a riak-client Ruby script and its 
>> output in irb that illustrates this:
>> 
>> ree-1.8.7-2010.02 :001 > require 'riak'
>> => true 
>> ree-1.8.7-2010.02 :002 > 
>> ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
>> => #http://127.0.0.1:8098> 
>> ree-1.8.7-2010.02 :004 > bucket = client['links']
>> => # 
>> ree-1.8.7-2010.02 :005 > key = bucket.keys.first
>> => "4000-17.xml" 
>> ree-1.8.7-2010.02 :006 > object = bucket[key]
>> => # 
>> ree-1.8.7-2010.02 :007 > object.delete
>> => # 
>> ree-1.8.7-2010.02 :008 > bucket.keys.first
>> => "4000-17.xml" 
>> ree-1.8.7-2010.02 :009 > object = bucket[key]
>> Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. not 
>> found
>> 
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
>>  `perform'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
>>  `request'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
>>  `reading_body'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in
>>  `request'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in
>>  `request'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in
>>  `start'
>>  from 
>> /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in
>>  `request'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
>>  `perform'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>>  `tap'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
>>  `perform'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
>>  `get'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend.rb:72:in
>>  `fetch_object'
>>  from 
>> /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/bucket.rb:101:in
>>  `[]'
>>  from riak-delete-failure.rb:9
>> 
>> Thanks,
>> Keith
>> 
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Sean Cribbs
With recent commits ( 
https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
 ), it is cached until you either refresh it manually by passing :reload => 
true or a block (for streaming key lists). This was the compromise reached in 
that pull-request.

All of this caching discussion glosses over the fact that you should not list 
keys in any real application. It really begs the question -- how often do you 
list keys in Redis, or memcached?  I suspect that generally you don't.  This 
isn't a relational database. (Also, how often do you actually do a full-table 
scan in MySQL? You don't if you're sane -- you use an index, or even LIMIT + 
OFFSET.)

I'm tempted to remove Document::all and make Bucket#keys harder to access, but 
the balance between discouraging bad behavior and exposing available 
functionality is a hard one to strike. I don't want new developers to 
immediately use list-keys and then be discouraged from using Riak because it's 
slow; on the other hand, it can be useful in some circumstances.  In those 
cases where it's useful, the developer should probably be responsible enough to 
request the key list only once; the caching behavior simply does this for them. 
I guess whether it should do this for them is the issue at hand.

All that said, I'm really torn on this issue, and the same problem applies to 
full-bucket MapReduce. Caveat emptor.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:

> How long is the key list cached like that, naturally?
> 
> 
> Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - 
> www.loomlearning.com - Skype: intel352
> 
> 
> 
> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs  wrote:
> Keith,
> 
> There was a pull-request issue out for this on the Github project 
> (https://github.com/seancribbs/ripple/pull/168). For various reasons, the 
> list of keys is memoized in the Riak::Bucket instance.  Passing :reload => 
> true to the #keys method will cause it to refresh.  I like to discourage 
> list-keys, but with the memoized list you don't shoot yourself in the foot as 
> often.
> 
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
> 
> > All -
> >
> > I just started working with Riak, and am using the riak-client Ruby gem.
> >
> > When I delete a key from a bucket, and try to fetch the value associated 
> > with that key, I get a 404 error (which is reasonable).  However, it 
> > remains in the bucket's list of keys (i.e. the value returned by 
> > bucket.keys().  Why is the key still reported to exist in the bucket? Is 
> > bucket.keys cached, and therefore unaware of the deletion? Here's a 
> > riak-client Ruby script and its output in irb that illustrates this:
> >
> > ree-1.8.7-2010.02 :001 > require 'riak'
> > => true
> > ree-1.8.7-2010.02 :002 >
> > ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
> > => #http://127.0.0.1:8098>
> > ree-1.8.7-2010.02 :004 > bucket = client['links']
> > => #
> > ree-1.8.7-2010.02 :005 > key = bucket.keys.first
> > => "4000-17.xml"
> > ree-1.8.7-2010.02 :006 > object = bucket[key]
> > => #
> > ree-1.8.7-2010.02 :007 > object.delete
> > => #
> > ree-1.8.7-2010.02 :008 > bucket.keys.first
> > => "4000-17.xml"
> > ree-1.8.7-2010.02 :009 > object = bucket[key]
> > Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. 
> > not found
> >
> >   from 
> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
> >  `perform'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
> >  `request'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
> >  `reading_body'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1053:in
> >  `request'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1037:in
> >  `request'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:543:in
> >  `start'
> >   from 
> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1035:in
> >  `request'
> >   from 
> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:47:in
> >  `perform'
> >   from 
> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
> >  `tap'
> >   from 
> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:46:in
> >  `perform'
> >   from 
> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/http_backend/transport_methods.rb:59:in
> >  `get'
> >   from 
> > /User

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Aphyr
Agreed. In fact, jrecursive pointed out to me last week that vnode 
operations are synchronous. That means that when you call list-keys, not 
only is it going to take a long time (right now upwards of 5 minutes) to 
complete, but while each vnode is returning its list of keys *it blocks 
any other requests*.


While list-keys is an unfortunate necessity for some things, its use 
should be minimized if you're going to get to any appreciable (100M 
keys) scale. I don't even know how we're going to use it at all above a 
billion. Possibly by listing the keys periodically from bitcask 
directly, and maintaining an index ourselves.


--Kyle

On 05/26/2011 09:40 AM, Sean Cribbs wrote:

With recent commits (
https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b

), it is cached until you either refresh it manually by passing :reload
=> true or a block (for streaming key lists). This was the compromise
reached in that pull-request.

All of this caching discussion glosses over the fact that you *should
not list keys* in any real application. It really begs the question --
how often do you list keys in Redis, or memcached? I suspect that
generally you don't. This isn't a relational database. (Also, how often
do you actually do a full-table scan in MySQL? You don't if you're sane
-- you use an index, or even LIMIT + OFFSET.)

I'm tempted to remove Document::all and make Bucket#keys harder to
access, but the balance between discouraging bad behavior and exposing
available functionality is a hard one to strike. I don't want new
developers to immediately use list-keys and then be discouraged from
using Riak because it's slow; on the other hand, it /can be useful/ in
some circumstances. In those cases where it's useful, the developer
should probably be responsible enough to request the key list only once;
the caching behavior simply does this for them. I guess whether it
/should/ do this for them is the issue at hand.

All that said, I'm really torn on this issue, and the same problem
applies to full-bucket MapReduce. Caveat emptor.

Sean Cribbs mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:


How long is the key list cached like that, naturally?*


*/
/*Jonathan Langevin*/
Systems Administrator
*Loom Inc.*
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com
 - www.loomlearning.com
 - Skype: intel352

/*

*


On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs mailto:s...@basho.com>> wrote:

Keith,

There was a pull-request issue out for this on the Github project
(https://github.com/seancribbs/ripple/pull/168). For various
reasons, the list of keys is memoized in the Riak::Bucket
instance. Passing :reload => true to the #keys method will cause
it to refresh. I like to discourage list-keys, but with the
memoized list you don't shoot yourself in the foot as often.

Sean Cribbs mailto:s...@basho.com>>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 10:29 AM, Keith Bennett wrote:

> All -
>
> I just started working with Riak, and am using the riak-client
Ruby gem.
>
> When I delete a key from a bucket, and try to fetch the value
associated with that key, I get a 404 error (which is reasonable).
However, it remains in the bucket's list of keys (i.e. the value
returned by bucket.keys(). Why is the key still reported to exist
in the bucket? Is bucket.keys cached, and therefore unaware of the
deletion? Here's a riak-client Ruby script and its output in irb
that illustrates this:
>
> ree-1.8.7-2010.02 :001 > require 'riak'
> => true
> ree-1.8.7-2010.02 :002 >
> ree-1.8.7-2010.02 :003 > client = Riak::Client.new
> => #http://127.0.0.1:8098 >
> ree-1.8.7-2010.02 :004 > bucket = client['links']
> => #
> ree-1.8.7-2010.02 :005 > key = bucket.keys.first
> => "4000-17.xml"
> ree-1.8.7-2010.02 :006 > object = bucket[key]
> => #
> ree-1.8.7-2010.02 :007 > object.delete
> => #
> ree-1.8.7-2010.02 :008 > bucket.keys.first
> => "4000-17.xml"
> ree-1.8.7-2010.02 :009 > object = bucket[key]
> Riak::HTTPFailedRequest: Expected [200, 300] from Riak but
received 404. not found
>
> from

/Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
`perform'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
`request'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:2142:in
`reading_body'
> from

/Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/l

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Keith Bennett

On May 26, 2011, at 12:40 PM, Sean Cribbs wrote:

> With recent commits ( 
> https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
>  ), it is cached until you either refresh it manually by passing :reload => 
> true or a block (for streaming key lists). This was the compromise reached in 
> that pull-request.
> 
> All of this caching discussion glosses over the fact that you should not list 
> keys in any real application. It really begs the question -- how often do you 
> list keys in Redis, or memcached?  I suspect that generally you don't.  This 
> isn't a relational database. (Also, how often do you actually do a full-table 
> scan in MySQL? You don't if you're sane -- you use an index, or even LIMIT + 
> OFFSET.)
> 
> I'm tempted to remove Document::all and make Bucket#keys harder to access, 
> but the balance between discouraging bad behavior and exposing available 
> functionality is a hard one to strike. I don't want new developers to 
> immediately use list-keys and then be discouraged from using Riak because 
> it's slow; on the other hand, it can be useful in some circumstances.  


> In those cases where it's useful, the developer should probably be 
> responsible enough to request the key list only once; the caching behavior 
> simply does this for them. I guess whether it should do this for them is the 
> issue at hand.

YES!  Exactly!  The decision to expose the functionality has been made; 
questioning whether or not this should have been done is orthogonal to whether 
or not the results should be cached, and the two should be considered 
separately.  

Regarding the latter, the function name represents an implied promise to the 
caller; my position is that the function's behavior is a substantial and 
surprising deviation from that implied promise.

Although buckets do not *contain* key/values in the riak *implementation*, the 
bucket / key-value containment metaphor pervades the developer interface, 
evidenced by, for example, the existence of the Riak::Bucket class, and the 
structure of the URL's with which values are manipulated.  In software products 
that have containment metaphors, how often do we see a function return a cached 
value rather than the up-to-date value, especially for products that manage 
shared data?

> All that said, I'm really torn on this issue, and the same problem applies to 
> full-bucket MapReduce. Caveat emptor.
> 

Ok, I'll be quiet now. ;)

Thanks,
Keith


> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:
> 
>> How long is the key list cached like that, naturally?
>> 
>> 
>> Jonathan Langevin
>> Systems Administrator
>> Loom Inc.
>> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - 
>> www.loomlearning.com - Skype: intel352
>> 
>> 
>> 
>> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs  wrote:
>> Keith,
>> 
>> There was a pull-request issue out for this on the Github project 
>> (https://github.com/seancribbs/ripple/pull/168). For various reasons, the 
>> list of keys is memoized in the Riak::Bucket instance.  Passing :reload => 
>> true to the #keys method will cause it to refresh.  I like to discourage 
>> list-keys, but with the memoized list you don't shoot yourself in the foot 
>> as often.
>> 
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>> 
>> > All -
>> >
>> > I just started working with Riak, and am using the riak-client Ruby gem.
>> >
>> > When I delete a key from a bucket, and try to fetch the value associated 
>> > with that key, I get a 404 error (which is reasonable).  However, it 
>> > remains in the bucket's list of keys (i.e. the value returned by 
>> > bucket.keys().  Why is the key still reported to exist in the bucket? Is 
>> > bucket.keys cached, and therefore unaware of the deletion? Here's a 
>> > riak-client Ruby script and its output in irb that illustrates this:
>> >
>> > ree-1.8.7-2010.02 :001 > require 'riak'
>> > => true
>> > ree-1.8.7-2010.02 :002 >
>> > ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
>> > => #http://127.0.0.1:8098>
>> > ree-1.8.7-2010.02 :004 > bucket = client['links']
>> > => #
>> > ree-1.8.7-2010.02 :005 > key = bucket.keys.first
>> > => "4000-17.xml"
>> > ree-1.8.7-2010.02 :006 > object = bucket[key]
>> > => #
>> > ree-1.8.7-2010.02 :007 > object.delete
>> > => #
>> > ree-1.8.7-2010.02 :008 > bucket.keys.first
>> > => "4000-17.xml"
>> > ree-1.8.7-2010.02 :009 > object = bucket[key]
>> > Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. 
>> > not found
>> >
>> >   from 
>> > /Users/kbennett/.rvm/gems/ree-1.8.7-2010.02/gems/riak-client-0.9.4/lib/riak/client/net_http_backend.rb:55:in
>> >  `perform'
>> >   from 
>> > /Users/kbennett/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/net/http.rb:1054:in
>> >  `request'
>> >   from 
>> > /Users

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Jonathan Langevin
A cache seems legitimate for performance, but perhaps the cache could
additionally be maintained for inserts/deletes?
At least then the cache is still being used, but is also accurate.

I don't know how expensive that would be though, but hopefully less
expensive than a key list reload, correct?
*


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Thu, May 26, 2011 at 1:18 PM, Keith Bennett <
keith.benn...@lmnsolutions.com> wrote:

>
> On May 26, 2011, at 12:40 PM, Sean Cribbs wrote:
>
> With recent commits (
> https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
>  ),
> it is cached until you either refresh it manually by passing :reload => true
> or a block (for streaming key lists). This was the compromise reached in
> that pull-request.
>
> All of this caching discussion glosses over the fact that you *should
> not list keys* in any real application. It really begs the question -- how
> often do you list keys in Redis, or memcached?  I suspect that generally you
> don't.  This isn't a relational database. (Also, how often do you actually
> do a full-table scan in MySQL? You don't if you're sane -- you use an index,
> or even LIMIT + OFFSET.)
>
> I'm tempted to remove Document::all and make Bucket#keys harder to access,
> but the balance between discouraging bad behavior and exposing available
> functionality is a hard one to strike. I don't want new developers to
> immediately use list-keys and then be discouraged from using Riak because
> it's slow; on the other hand, it *can be useful* in some circumstances.
>
>
>
> In those cases where it's useful, the developer should probably be
> responsible enough to request the key list only once; the caching behavior
> simply does this for them. I guess whether it *should* do this for them is
> the issue at hand.
>
>
> YES!  Exactly!  The decision to expose the functionality has been made;
> questioning whether or not this should have been done is orthogonal to
> whether or not the results should be cached, and the two should be
> considered separately.
>
> Regarding the latter, the function name represents an implied promise to
> the caller; my position is that the function's behavior is a substantial and
> surprising deviation from that implied promise.
>
> Although buckets do not *contain* key/values in the riak *implementation*,
> the bucket / key-value containment metaphor pervades the developer
> interface, evidenced by, for example, the existence of the Riak::Bucket
> class, and the structure of the URL's with which values are manipulated.  In
> software products that have containment metaphors, how often do we see a
> function return a cached value rather than the up-to-date value, especially
> for products that manage shared data?
>
> All that said, I'm really torn on this issue, and the same problem applies
> to full-bucket MapReduce. Caveat emptor.
>
>
> Ok, I'll be quiet now. ;)
>
> Thanks,
> Keith
>
>
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:
>
> How long is the key list cached like that, naturally?*
>
>  
>  Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs  wrote:
>
>> Keith,
>>
>> There was a pull-request issue out for this on the Github project (
>> https://github.com/seancribbs/ripple/pull/168). For various reasons, the
>> list of keys is memoized in the Riak::Bucket instance.  Passing :reload =>
>> true to the #keys method will cause it to refresh.  I like to discourage
>> list-keys, but with the memoized list you don't shoot yourself in the foot
>> as often.
>>
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>>
>> > All -
>> >
>> > I just started working with Riak, and am using the riak-client Ruby gem.
>> >
>> > When I delete a key from a bucket, and try to fetch the value associated
>> with that key, I get a 404 error (which is reasonable).  However, it remains
>> in the bucket's list of keys (i.e. the value returned by bucket.keys().  Why
>> is the key still reported to exist in the bucket? Is bucket.keys cached, and
>> therefore unaware of the deletion? Here's a riak-client Ruby script and its
>> output in irb that illustrates this:
>> >
>> > ree-1.8.7-2010.02 :001 > require 'riak'
>> > => true
>> > ree-1.8.7-2010.02 :002 >
>> > ree-1.8.7-2010.02 :003 >   client = Riak::Client.new
>> > => #http://127.0.0.1:8098>
>> > ree-1.8.7-2010.02 :004 > bucket = client['links']
>> > => #
>> > ree-1.8.7-2010.02 :005 > key = bucket.keys.first
>> > => "4000-17.xml"
>> > ree-1.8.7-2010.02 :006 > o

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Jonathan Langevin
Additionally, perhaps the automatically updating cache (regarding
inserts/deletes) could be an optionally enabled behavior?
As there are cases where it could be needlessly expensive (i.e. - high
write/delete scenarios), especially when someone does not use the key
listing feature.
*


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Thu, May 26, 2011 at 1:23 PM, Jonathan Langevin <
jlange...@loomlearning.com> wrote:

> A cache seems legitimate for performance, but perhaps the cache could
> additionally be maintained for inserts/deletes?
> At least then the cache is still being used, but is also accurate.
>
> I don't know how expensive that would be though, but hopefully less
> expensive than a key list reload, correct?
> *
>
>  
>  Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
> On Thu, May 26, 2011 at 1:18 PM, Keith Bennett <
> keith.benn...@lmnsolutions.com> wrote:
>
>>
>> On May 26, 2011, at 12:40 PM, Sean Cribbs wrote:
>>
>> With recent commits (
>> https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
>>  ),
>> it is cached until you either refresh it manually by passing :reload => true
>> or a block (for streaming key lists). This was the compromise reached in
>> that pull-request.
>>
>> All of this caching discussion glosses over the fact that you *should
>> not list keys* in any real application. It really begs the question --
>> how often do you list keys in Redis, or memcached?  I suspect that generally
>> you don't.  This isn't a relational database. (Also, how often do you
>> actually do a full-table scan in MySQL? You don't if you're sane -- you use
>> an index, or even LIMIT + OFFSET.)
>>
>> I'm tempted to remove Document::all and make Bucket#keys harder to access,
>> but the balance between discouraging bad behavior and exposing available
>> functionality is a hard one to strike. I don't want new developers to
>> immediately use list-keys and then be discouraged from using Riak because
>> it's slow; on the other hand, it *can be useful* in some circumstances.
>>
>>
>>
>> In those cases where it's useful, the developer should probably be
>> responsible enough to request the key list only once; the caching behavior
>> simply does this for them. I guess whether it *should* do this for them
>> is the issue at hand.
>>
>>
>> YES!  Exactly!  The decision to expose the functionality has been made;
>> questioning whether or not this should have been done is orthogonal to
>> whether or not the results should be cached, and the two should be
>> considered separately.
>>
>> Regarding the latter, the function name represents an implied promise to
>> the caller; my position is that the function's behavior is a substantial and
>> surprising deviation from that implied promise.
>>
>> Although buckets do not *contain* key/values in the riak *implementation*,
>> the bucket / key-value containment metaphor pervades the developer
>> interface, evidenced by, for example, the existence of the Riak::Bucket
>> class, and the structure of the URL's with which values are manipulated.  In
>> software products that have containment metaphors, how often do we see a
>> function return a cached value rather than the up-to-date value, especially
>> for products that manage shared data?
>>
>> All that said, I'm really torn on this issue, and the same problem applies
>> to full-bucket MapReduce. Caveat emptor.
>>
>>
>> Ok, I'll be quiet now. ;)
>>
>> Thanks,
>> Keith
>>
>>
>>  Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:
>>
>> How long is the key list cached like that, naturally?*
>>
>>  
>>  Jonathan Langevin
>> Systems Administrator
>> Loom Inc.
>> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
>> www.loomlearning.com - Skype: intel352
>> *
>>
>>
>> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs  wrote:
>>
>>> Keith,
>>>
>>> There was a pull-request issue out for this on the Github project (
>>> https://github.com/seancribbs/ripple/pull/168). For various reasons, the
>>> list of keys is memoized in the Riak::Bucket instance.  Passing :reload =>
>>> true to the #keys method will cause it to refresh.  I like to discourage
>>> list-keys, but with the memoized list you don't shoot yourself in the foot
>>> as often.
>>>
>>> Sean Cribbs 
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>>>
>>> > All -
>>> >
>>> > I just started working with Riak, and am using the riak-client Ruby
>>> gem.
>>> >
>>> > When I delete a key from a bucket, and try to fetch the value
>>> associated with that

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Aphyr

In software products that have containment metaphors, how often do we
see a function return a cached value rather than the up-to-date
value, especially for products that manage shared data?


Pretty frequently, actually. Every Ruby ORM I've used caches
associations by default. Even when listing is cheap, deserialization isn't.

For some reason this never tripped me up; I recall looking at the rdoc
and finding it quite obvious that this method had a :reload option. But 
if you just guessed at the existence of #keys, it is probably (like many 
things about Riak) surprising! :)


--Kyle

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak KV ETS Backend?

2011-05-26 Thread Jordan West
Hi all,

I'm posting to the list for the first time after talking with Mark
(ironically, at MongoSF the other day), about a use-case I'm considering
Riak for. Quickly, I should probably say I've only played with Riak on a
side project or two, and dabbled with Riak Core thanks to the new GitHub
blog. Part of the reason I'm writing the list is to determine if for my
use-case its worth investing significant time learning and prototyping some
possible implementations with Riak(-Core). Anyways, on to the question:

Is the ETS Backend in the riak_kv repo something maintained and endorsed for
use by the Basho team? I have a dataset which is well suited for a k/v store
and needs to be eventually consistent across a cluster of Erlang nodes. Riak
crossed my mind because of those reasons. Also, being able to embed it into
my application like Mnesia is a plus. However, I could care less in this
case about losing data. The data is ephemeral anyways as its a map of a
string assigned per connection to a couple erlang process ids, and hence I
want to store it in-memory (also want it in memory for performance reasons).
This data is very read-heavy, but must be available for writes or new
connections will begin to fail. I know this is not a typical use for Riak,
since one of its most touted benefits is not losing data but would it be too
far of a stretch to use Riak as an eventually consistent, in-memory
key-value store embedded into an Erlang application? If not, I figure this
is something I could build with Riak Core. Does anyone know if there already
is a OSS project I could contribute to instead of starting from scratch in
that case?

Thanks for your help,

Jordan West
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Sean Cribbs
Kyle, you bring up a good point that I feel strongly about -- most people use 
list-keys as a substitute for better solutions (also known as an anti-pattern). 
 In most cases what they really need is one of:

* Better key/schema design, so keys are at least guessable if not knowable.
* Secondary indexes, which can sometimes be built manually, but are also in the 
product roadmap.
* Full-text search.

None of these require list keys, but all of them require strong knowledge of 
your problem domain and creative thinking.

With all of this discussion it has been pointed out to me there are two issues 
at hand, possibly conflated as one:

* Which is the least surprise, caching the key list or the incurring the large 
cost of the operation? Or is it that it is apparently performant in development 
(small numbers of keys) but not in production (large numbers of keys)?
* How can we better discourage use of list-keys while still exposing to 
developers who can handle the performance hit (or enjoy holes in their feet)?

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On May 26, 2011, at 12:52 PM, Aphyr wrote:

> Agreed. In fact, jrecursive pointed out to me last week that vnode operations 
> are synchronous. That means that when you call list-keys, not only is it 
> going to take a long time (right now upwards of 5 minutes) to complete, but 
> while each vnode is returning its list of keys *it blocks any other requests*.
> 
> While list-keys is an unfortunate necessity for some things, its use should 
> be minimized if you're going to get to any appreciable (100M keys) scale. I 
> don't even know how we're going to use it at all above a billion. Possibly by 
> listing the keys periodically from bitcask directly, and maintaining an index 
> ourselves.
> 
> --Kyle
> 
> On 05/26/2011 09:40 AM, Sean Cribbs wrote:
>> With recent commits (
>> https://github.com/seancribbs/ripple/compare/35d7323fb0e179c8c971...da3ab71a19d194c65a7b
>> 
>> ), it is cached until you either refresh it manually by passing :reload
>> => true or a block (for streaming key lists). This was the compromise
>> reached in that pull-request.
>> 
>> All of this caching discussion glosses over the fact that you *should
>> not list keys* in any real application. It really begs the question --
>> how often do you list keys in Redis, or memcached? I suspect that
>> generally you don't. This isn't a relational database. (Also, how often
>> do you actually do a full-table scan in MySQL? You don't if you're sane
>> -- you use an index, or even LIMIT + OFFSET.)
>> 
>> I'm tempted to remove Document::all and make Bucket#keys harder to
>> access, but the balance between discouraging bad behavior and exposing
>> available functionality is a hard one to strike. I don't want new
>> developers to immediately use list-keys and then be discouraged from
>> using Riak because it's slow; on the other hand, it /can be useful/ in
>> some circumstances. In those cases where it's useful, the developer
>> should probably be responsible enough to request the key list only once;
>> the caching behavior simply does this for them. I guess whether it
>> /should/ do this for them is the issue at hand.
>> 
>> All that said, I'm really torn on this issue, and the same problem
>> applies to full-bucket MapReduce. Caveat emptor.
>> 
>> Sean Cribbs mailto:s...@basho.com>>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On May 26, 2011, at 10:35 AM, Jonathan Langevin wrote:
>> 
>>> How long is the key list cached like that, naturally?*
>>> 
>>> 
>>> */
>>> /*Jonathan Langevin*/
>>> Systems Administrator
>>> *Loom Inc.*
>>> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com
>>>  - www.loomlearning.com
>>>  - Skype: intel352
>>> 
>>> /*
>>> 
>>> *
>>> 
>>> 
>>> On Thu, May 26, 2011 at 10:35 AM, Sean Cribbs >> > wrote:
>>> 
>>>Keith,
>>> 
>>>There was a pull-request issue out for this on the Github project
>>>(https://github.com/seancribbs/ripple/pull/168). For various
>>>reasons, the list of keys is memoized in the Riak::Bucket
>>>instance. Passing :reload => true to the #keys method will cause
>>>it to refresh. I like to discourage list-keys, but with the
>>>memoized list you don't shoot yourself in the foot as often.
>>> 
>>>Sean Cribbs mailto:s...@basho.com>>
>>>Developer Advocate
>>>Basho Technologies, Inc.
>>>http://basho.com/
>>> 
>>>On May 26, 2011, at 10:29 AM, Keith Bennett wrote:
>>> 
>>>> All -
>>>>
>>>> I just started working with Riak, and am using the riak-client
>>>Ruby gem.
>>>>
>>>> When I delete a key from a bucket, and try to fetch the value
>>>associated with that key, I get a 404 error (which is reasonable).
>>>Howeve

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Mike Oxford
On Thu, May 26, 2011 at 11:21 AM, Sean Cribbs  wrote:

> With all of this discussion it has been pointed out to me there are two
> issues at hand, possibly conflated as one:
>
> * Which is the least surprise, caching the key list or the incurring the
> large cost of the operation? Or is it that it is apparently performant in
> development (small numbers of keys) but not in production (large numbers of
> keys)?
> * How can we better discourage use of list-keys while still exposing to
> developers who can handle the performance hit (or enjoy holes in their
> feet)?
>
>
1)  I would rather be hit by a large cost that I can see and feel instead of
trying to run down hidden keys from a stale cache (reflect on chasing memory
corruptions...)
2)  Make a riak-configuration value to enable or disable it.  You have to
explicitly go turn it on to use it.   It's more of an "if you turn this on,
you implicitly accept the penalties and issues surrounding actually using
it."

-mox
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak doesn't use consistent hashing.

2011-05-26 Thread Greg Nelson
Excellent. Let me know what I can do to help.

-Greg
On Thursday, May 26, 2011 at 7:37 AM, Justin Sheehy wrote: 
> Hi, Greg.
> 
> Thanks for your thoughtful analysis and the pull request.
> 
> On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:
> 
> > However, the skipping bit isn't part of
> > Riak's preflist calculation. Instead, nodes claim partitions in such a way
> > as to be spaced out by target_n_val, to obviate the need for skipping.
> 
> A fun bit of history here: once upon a time, Riak's claiming worked in
> the same way as described by Amazon, with "skipping" and all. We
> noticed that this approach caused a different set of operational
> difficulties when hinted handoff due to node outages was occurring at
> the same time as a membership change. That prompted changes to the
> claim algorithm, which we still consider an area deserving of active
> improvement.
> 
> Multiple people will be reading, analyzing, and testing your work to
> contribute to this improvement. We very much appreciate your efforts,
> and want to make sure that we incorporate them in the best possible
> way.
> 
> Thanks,
> 
> -Justin
> 
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak doesn't use consistent hashing

2011-05-26 Thread Greg Nelson
The "not found" issue is a different one, but related. The issue there is that 
when a node joins the ring, the ring state is immediately changed. However, it 
takes time to handoff partitions to new owners. During that time, if a request 
comes in for data which has > r of its replicas on partitions which changed 
ownership, the new owners will reply "not found" if they don't have the data 
yet.

It's possible that *all* the partitions in a preflist changed ownership, 
especially in circumstances I described with re-striping. So no r_val can help 
you there.

And actually, even if only 2 of the 3 (assuming n_val=3) partitions in the 
preflist moved, a read with r=1 *still* won't work because of an optimization 
called "basic quorum". That is, if the majority of replicas come back "not 
found" the coordinator will reply with "not found" instead of waiting to see if 
the other nodes respond with something.

Furthermore, the time it takes for handoffs to finish (or even start) can be a 
long time because vnodes will wait for periods of inactivity before doing 
handoff, and there are also restrictions on how many handoffs can happen at a 
time. You can tune those configurations with the handoff_concurrency and 
vnode_inactivity_timeout parameters in the riak_core section of app.config.

I believe the next release will have an option for turning off basic quorum. It 
will also have options that will allow your client to tell the difference 
between a real "not found" and one where r was not satisfied. And a further out 
release will have a proper fix for this whole issue. Probably involving 
forwarding of requests to old owners in cases where the handoff hasn't finished.

-Greg
On Thursday, May 26, 2011 at 7:21 AM, Ben Tilly wrote:
Performance is fine.  However requests get a "not found" response for an 
extended period of time.  See 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/thread.html#4078
 for previous discussion of what sounds like the same issue.
> 
> On Thu, May 26, 2011 at 6:57 AM, Jonathan Langevin 
>  wrote:
> >  That sounds quite disconcerting. What happens to the performance of the 
> > cluster when this occurs?
> > 
> > 
> > Jonathan Langevin
> > Systems Administrator
> > Loom Inc.
> > Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - 
> > www.loomlearning.com - Skype: intel352 
> > 
> > 
> > On Thu, May 26, 2011 at 1:54 AM, Greg Nelson  wrote:
> > I've been doing some digging through the details of how a node joins a 
> > cluster. When you hear that Riak uses consistent hashing, you'd expect it 
> > to distribute keys to nodes by hashing keys onto the ring AND hashing nodes 
> > onto the ring. Keys belong to the closest node on the ring, in the 
> > clockwise direction. Add a node, it hashes onto the ring and takes over 
> > some keys. Ordinarily the node would hash onto the ring in several places, 
> > to achieve better spread. Some data (roughly 1 / #nodes) moves to the new 
> > node from each of the other nodes, and everything else stays the same. 
> > > 
> > > In what Amazon describes as operationally simpler (strategy 3 in the 
> > > Dynamo paper), the ring is instead divided into equally-sized partitions. 
> > > Nodes are hashed onto the ring, and preflists are calculated by walking 
> > > clockwise from a partition, skipping partitions on already visited nodes. 
> > > Riak does something similar: it divides the ring into equally-sized 
> > > partitions, then nodes "randomly" claim partitions. However, the skipping 
> > > bit isn't part of Riak's preflist calculation. Instead, nodes claim 
> > > partitions in such a way as to be spaced out by target_n_val, to obviate 
> > > the need for skipping. 
> > > 
> > > Now, getting back to what happens when a node joins. The new node 
> > > calculates a new ring state that maintains the target_n_val invariant, as 
> > > well as trying to keep even spread of partitions per node. The algorithm 
> > > (default_choose_claim) is heuristic and greedy in nature, and recursively 
> > > transfers partitions to the new node until optimal spread is achieved, 
> > > maintaining target_n_val along the way. But if -- during one of those 
> > > recursive calls -- it can't meet the target_n_val, it will throw up its 
> > > hands and completely re-do the whole ring (by calling claim_rebalance_n). 
> > > Striping the partitions across nodes, in a round-robin fashion. When that 
> > > happens, most of the data needs to be handed off between nodes. 
> > > 
> > > This happens a lot, with many ring sizes. With ring_creation_size=128 
> > > (i.e., 128 partitions), it will happen when adding node 9 (87.5% of data 
> > > moves), adding node 12 (82%), adding node 15 (80%), adding node 19 (94%). 
> > > It happens with all ring sizes >= 128 (256, 512, 1024, ...). It appears 
> > > that any ring_creation_size (64 by default) is safe for growing to 8 
> > > nodes or so. But if you want to go beyond that... A ring size of >= 128 
> > > with 

Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Sean Cribbs
> 
> 1)  I would rather be hit by a large cost that I can see and feel instead of 
> trying to run down hidden keys from a stale cache (reflect on chasing memory 
> corruptions...)

I think that's fair; it emphasizes the fact that if you shouldn't use it once, 
you shouldn't be using it twice! (or at least manage the list yourself and all 
that entails)

> 2)  Make a riak-configuration value to enable or disable it.  You have to 
> explicitly go turn it on to use it.   It's more of an "if you turn this on, 
> you implicitly accept the penalties and issues surrounding actually using it."
> 

A configuration variable isn't necessary, at least from the HTTP interface, you 
have to explicitly request keys. List-keys on PBC is a separate request.  
Either way you really need to explicitly call it (except in the case of 
Document::all, which is sort of a separate issue altogether).  The question is 
more how the higher-level client should behave; we're not yet prepared to 
remove it entirely from Riak.

In speaking with Kyle and some of the other committers, we've come to a new 
decision:

1) The cached key-list will be removed altogether.
2) We may introduce a console warning when you invoke list-keys, with the 
option to turn it off if you set a really verbose and annoying configuration 
option in the client. Something like 
:yes_i_really_really_want_to_list_keys_dont_warn_me_ever => true.  The warning 
would be active for Client#buckets as well as Bucket#keys.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Client Resources, Deleting a Key Doesn't Remove it from bucket.keys

2011-05-26 Thread Jonathan Langevin
I think that configuration key should be a bit more verbose :-)*


Jonathan Langevin
Systems Administrator
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Thu, May 26, 2011 at 3:12 PM, Sean Cribbs  wrote:

> :yes_i_really_really_want_to_list_keys_dont_warn_me_ever => true
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Mathias Meyer
This behavior is specific to the Solr interface. It first fetches document IDs 
matching the criteria and then fetches the documents from Riak KV. Using the 
Erlang interface you can fetch just the IDs. It would certainly make sense to 
add an option like that, but it'd be inconsistent with Solr's interface.

As far as I know (and I tried to confirm this in the dark depths of the Solr 
wiki) you can tell Solr to store specifc indexed fields, but not specify an 
option at query time to omit stored fields. Correct me if I'm wrong though, 
would be more than reason (and probably also simple) enough to add something 
like that to the Riak Search Solr API.

Mathias Meyer
Developer Advocate, Basho Technologies


On Donnerstag, 26. Mai 2011 at 20:50, Greg Pascale wrote:

> Thanks Mathias,
> 
> We'll continue to do that then.
> 
> It seems to me, though, that in the common case you aren't interested in the 
> index docs when you do a search, so it's needlessly inefficient to retrieve 
> them. Might it make sense to add a search option to not return the index docs 
> if you don't care about them?
> 
> -Greg
> 
> On Thu, May 26, 2011 at 6:42 AM, Mathias Meyer  (mailto:math...@basho.com)> wrote:
> >  Greg,
> > 
> >  Riak Search stores indexed documents in Riak KV too, as serialized Erlang 
> > terms. You can easily verify that by requesting a document from 
> > http://riak.host:8098/riak/_rsid_/key.
> > 
> >  So whenever you query something through the Solr interface the documents 
> > you get back are fetched from these buckets, and therefore the same 
> > distribution and consistency properties apply to them as to objects stored 
> > directly in Riak KV. Bottom line is there's nothing wrong with just using 
> > them instead of fetching them again from Riak KV.
> > 
> >  Mathias Meyer
> >  Developer Advocate, Basho Technologies
> > 
> > 
> >  On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:
> > 
> > > Hi,
> > > 
> > > In our data model, our riak objects are flat JSON objects, and thus their 
> > > corresponding index documents are nearly identical - the only difference 
> > > is that a few fields which are ints in the riak objects are strings in 
> > > the index doc.
> > > 
> > > Since they are so similar, we are directly using the index docs returned 
> > > from our search call, skipping the second step of doing gets on the 
> > > returned keys to retrieve the real objects.
> > > 
> > > Is this advisable? Are there any circumstances under which we might run 
> > > into consistency issues?
> > > 
> > > Thanks,
> > > -Greg
> > > 
> > > ___
> > > riak-users mailing list
> > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) 
> > > (mailto:riak-users@lists.basho.com)
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Eric Moritz
Out of curiosity what is the key in this URL?
http://riak.host:8098/riak/_rsid_/key

On Thu, May 26, 2011 at 9:42 AM, Mathias Meyer  wrote:
> Greg,
>
> Riak Search stores indexed documents in Riak KV too, as serialized Erlang 
> terms. You can easily verify that by requesting a document from 
> http://riak.host:8098/riak/_rsid_/key.
>
> So whenever you query something through the Solr interface the documents you 
> get back are fetched from these buckets, and therefore the same distribution 
> and consistency properties apply to them as to objects stored directly in 
> Riak KV. Bottom line is there's nothing wrong with just using them instead of 
> fetching them again from Riak KV.
>
> Mathias Meyer
> Developer Advocate, Basho Technologies
>
>
> On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:
>
>> Hi,
>>
>> In our data model, our riak objects are flat JSON objects, and thus their 
>> corresponding index documents are nearly identical - the only difference is 
>> that a few fields which are ints in the riak objects are strings in the 
>> index doc.
>>
>> Since they are so similar, we are directly using the index docs returned 
>> from our search call, skipping the second step of doing gets on the returned 
>> keys to retrieve the real objects.
>>
>> Is this advisable? Are there any circumstances under which we might run into 
>> consistency issues?
>>
>> Thanks,
>> -Greg
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Greg Pascale
Eric, I believe the key is the document id, which will be the same as the
key of the corresponding object in .

-Greg

On Thu, May 26, 2011 at 12:41 PM, Eric Moritz wrote:

> Out of curiosity what is the key in this URL?
> http://riak.host:8098/riak/_rsid_/key
>
> On Thu, May 26, 2011 at 9:42 AM, Mathias Meyer  wrote:
> > Greg,
> >
> > Riak Search stores indexed documents in Riak KV too, as serialized Erlang
> terms. You can easily verify that by requesting a document from
> http://riak.host:8098/riak/_rsid_/key.
> >
> > So whenever you query something through the Solr interface the documents
> you get back are fetched from these buckets, and therefore the same
> distribution and consistency properties apply to them as to objects stored
> directly in Riak KV. Bottom line is there's nothing wrong with just using
> them instead of fetching them again from Riak KV.
> >
> > Mathias Meyer
> > Developer Advocate, Basho Technologies
> >
> >
> > On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:
> >
> >> Hi,
> >>
> >> In our data model, our riak objects are flat JSON objects, and thus
> their corresponding index documents are nearly identical - the only
> difference is that a few fields which are ints in the riak objects are
> strings in the index doc.
> >>
> >> Since they are so similar, we are directly using the index docs returned
> from our search call, skipping the second step of doing gets on the returned
> keys to retrieve the real objects.
> >>
> >> Is this advisable? Are there any circumstances under which we might run
> into consistency issues?
> >>
> >> Thanks,
> >> -Greg
> >>
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riaksearch: using index docs in place of real objects

2011-05-26 Thread Mathias Meyer
That is correct, Greg. It's either determined by the key used to store the 
object in Riak KV (given the precommit hook is used), or by a key specified 
when indexing directly into Riak Search, using e.g. the Solr or the Erlang API. 
There'll always be a key required, and that'll be used to look up the 
serialized document in Riak KV.

Mathias Meyer
Developer Advocate, Basho Technologies


On Donnerstag, 26. Mai 2011 at 21:56, Greg Pascale wrote:

> Eric, I believe the key is the document id, which will be the same as the key 
> of the corresponding object in .
> 
> -Greg
> 
> On Thu, May 26, 2011 at 12:41 PM, Eric Moritz  (mailto:e...@themoritzfamily.com)> wrote:
> > Out of curiosity what is the key in this URL?
> > http://riak.host:8098/riak/_rsid_/key
> > 
> > On Thu, May 26, 2011 at 9:42 AM, Mathias Meyer  > (mailto:math...@basho.com)> wrote:
> > > Greg,
> > > 
> > > Riak Search stores indexed documents in Riak KV too, as serialized Erlang 
> > > terms. You can easily verify that by requesting a document from 
> > > http://riak.host:8098/riak/_rsid_/key.
> > > 
> > > So whenever you query something through the Solr interface the documents 
> > > you get back are fetched from these buckets, and therefore the same 
> > > distribution and consistency properties apply to them as to objects 
> > > stored directly in Riak KV. Bottom line is there's nothing wrong with 
> > > just using them instead of fetching them again from Riak KV.
> > > 
> > > Mathias Meyer
> > > Developer Advocate, Basho Technologies
> > > 
> > > 
> > > On Mittwoch, 25. Mai 2011 at 00:34, Greg Pascale wrote:
> > > 
> > > > Hi,
> > > > 
> > > > In our data model, our riak objects are flat JSON objects, and thus 
> > > > their corresponding index documents are nearly identical - the only 
> > > > difference is that a few fields which are ints in the riak objects are 
> > > > strings in the index doc.
> > > > 
> > > > Since they are so similar, we are directly using the index docs 
> > > > returned from our search call, skipping the second step of doing gets 
> > > > on the returned keys to retrieve the real objects.
> > > > 
> > > > Is this advisable? Are there any circumstances under which we might run 
> > > > into consistency issues?
> > > > 
> > > > Thanks,
> > > > -Greg
> > > > 
> > > > ___
> > > > riak-users mailing list
> > > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com) 
> > > > (mailto:riak-users@lists.basho.com)
> > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > 
> > > 
> > > 
> > > ___
> > > riak-users mailing list
> > > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> >  ___
> >  riak-users mailing list
> > riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


riak locking and out of memory

2011-05-26 Thread Ron Yang
Hi, I am playing around with riak and have set up a 2 node cluster.
One node on a desktop ubuntu, and the other node on a macbook pro os/x
10.5.

On the macbook I looped across 400meg files using bash and curl to
upload them as documents into a bucket:
for a in *.gz; do curl -v http://127.0.0.1:8098/riak/bub/$a
--data-binary @$a; done

While this was happening I poked at the local riak server on the
desktop using curl.  Curiously, when I issued this command on node 1:
curl -v http://localhost:8098/riak/bub?keys=true

it blocked, apparently waiting for the current file to finish
uploading on node 2.  I thought there wasn't any locking?  Was it
waiting for a quorum?

When I came back from lunch I found that node 2 died not long after with this:
binary_alloc: Cannot allocate 438689176 bytes of memory (of type "binary").
(But I see in the FAQ that 50MB is the recommended largest document size)

Are there any tracing or debugging facilities that I can use to
diagnose latencies or execution plans?

Thanks.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak locking and out of memory

2011-05-26 Thread Justin Sheehy
Hi, Ron.

On Thu, May 26, 2011 at 4:33 PM, Ron Yang  wrote:

> On the macbook I looped across 400meg files using bash and curl to
> upload them as documents into a bucket:

There are other details in your post that I might comment on, but I
will focus on the main point.

What you describe here simply will not work.  Single documents in Riak
at that size are going to cause problems.  There is an interface atop
Riak ("Luwak") which can handle such things just fine, if large file
storage is your main use case.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with capacity planning pages on wiki

2011-05-26 Thread Anthony Molinaro

Hi Justin,

   Thanks for the reply.  Good to know you may have some partial
solutions for the sizings of items.  Out use case may long term
require us to write out own backend just for space efficiency, but
I'm hoping we can make it quite far with bitcask.  I've got enough on
my plate at the moment so am unlikely to get to this before you guys
do, but as I recently forked riak_kv for something else, you never
know.

Thanks

-Anthony

On Wed, May 25, 2011 at 08:10:29PM -0400, Justin Sheehy wrote:
> Hi, Anthony.
> 
> There are really three different things below:
> 
> 1- reducing the minimum overhead of the {Bucket, Key} encoding when
> riak is storing into bitcask
> 
> 2- reducing the size of the vector clock encoding
> 
> 3- reducing the size of the overall riak_object structure and metadata
> 
> All three of these are worth doing.  The reason they are the way they
> are now is that the initial assumptions for most Riak deployments was
> of a high enough mean object size that these few bytes per object
> would proportionally be small noise -- but that's just history and not
> a reason to avoid improvements.
> 
> In fact, preliminary work has been done on all three of these.  It
> just hasn't yet been such a high priority that it got pushed through
> to the finish.  One tricky part with all three is backward
> compatibility, as most production Riak clusters do not expect to need
> a full stop every time we want to make an improvement like these.
> 
> Solving #1, by the way, isn't really in bitcask itself but rather in
> riak_kv_bitcask_backend.  I can take a swing at that (with backward
> compatibility) shortly.  I might also be able to help dig up some of
> the old work on #2 that is nearly a year old, and I think Andy Gross
> may have done some of what's needed for #3.
> 
> With less words: I agree, all this should be made smaller.
> 
> And don't let this stop you if you want to jump ahead and give some of it a 
> try!
> 
> -Justin
> 
> 
> 
> On Wed, May 25, 2011 at 1:50 PM, Anthony Molinaro
>  wrote:
> 
> > Anyway, things make a lot more sense now, and I'm thinking I may need
> > to fork bitcask and get rid of some of that extra overhead.  For instance
> > 13 bytes of overhead to store a tuple of binaries seems unnecessary, it's
> > probably better to just have a single binary with the bucket size as a
> > prefix, so something like
> >
> > <>
> >
> > That way you turn 13 bytes of overhead to 2.
> >
> > Of course I'd need some way to work with old data, but a one time migration
> > shouldn't be too bad.
> >
> > It also seems like there should be some way to trim down some of that on
> > disk usage.  I mean 300+ bytes to store 36 bytes is a lot.

-- 

Anthony Molinaro   

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Eric Moritz Added As A Wiki Committer

2011-05-26 Thread Mark Phillips
Hey All -

Some good news from the community I wanted to pass along: we just
added Eric Moritz as a Committer to the Riak Wiki.

There's a short blog post about it here --->
http://blog.basho.com/2011/05/26/Eric-Moritz-is-Now-A-Wiki-Committer

Be sure to give Eric a (virtual) high five next time you see him in #riak.

Congrats, Eric!

Mark

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


hidding buckets and keys

2011-05-26 Thread Antonio Rohman Fernandez


hello all,

http://IP:8098/riak?buckets=true [ will show all
available buckets on Riak
]
http://IP:8098/riak/bucketname?keys=true&props=false [ will show all
available keys on a bucket ]

to me, this proves a very big security
risk, as if somebody discovers your Riak server's IP, is very easy to
read all the information from it, even if you try to obfuscate the
buckets/keys... everything is highly readable.
there is any way to
disable those options? like {riak_kv_stat, false} hides the /stats
page

thanks

Rohman 

 [1]

ANTONIO ROHMAN FERNANDEZ
CEO, Founder &
Lead Engineer
roh...@mahalostudio.com [2]  
PROJECTS
MaruBatsu.es
[3]
PupCloud.com [4]
Wedding Album [5]

 

Links:
--
[1]
http://mahalostudio.com
[2] mailto:roh...@mahalostudio.com
[3]
http://marubatsu.es
[4] http://pupcloud.com
[5]
http://wedding.mahalostudio.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread OJ Reeves
Rohman,

In our case, the only nodes that are allowed to hit the Riak cluster are
those of our applications. We do not allow access to the Riak nodes from the
public Internet. Firewall rules are in place to prevent this in some cases,
and in others the Riak nodes themselves are on internal networks. In general
I think either of these approaches is sound (I'm happy to be corrected ;)).
Perhaps you should look into something similar?

Best regards

OJ

On 27 May 2011 14:55, Antonio Rohman Fernandez wrote:

> hello all,
>
> http://IP:8098/riak?buckets=true [ will show all available buckets on Riak
> ]
> http://IP:8098/riak/bucketname?keys=true&props=false [ will show all
> available keys on a bucket ]
>
> to me, this proves a very big security risk, as if somebody discovers your
> Riak server's IP, is very easy to read all the information from it, even if
> you try to obfuscate the buckets/keys... everything is highly readable.
> there is any way to disable those options? like {riak_kv_stat, false} hides
> the /stats page
>
> thanks
>
> Rohman
>
> [image: line][image: logo]  *Antonio Rohman 
> Fernandez*
> CEO, Founder & Lead Engineer
> roh...@mahalostudio.com *Projects*
> MaruBatsu.es 
> PupCloud.com 
> Wedding Album [image: line]
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 

OJ Reeves
http://buffered.io/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread Alexander Sicular
Hi Rohman,

It is not recommended that you deploy Riak on the public internet. Keep all 
access private and then implement iptables on each individual node securing 
access to upstream clients.

Ports to keep in mind - 

http(s) port (8098)
protocol buffers port (8099)
epmd (4369)
forcing the range of ports erlang uses to communicate amongst other erlang 
nodes.

The latter is not part of the default configuration but I think it should be. 
At least commented out in app.config.

Put it right at the top of the config array above the riak_core directives like 
so:

[

%% limit dynamic ports erlang uses to communicate
%% pick some range that works in your environment 
%{kernel, [
%  {inet_dist_listen_min, 21000}, 
%  {inet_dist_listen_max, 22000}
%]},


%% Riak Core config
{riak_core, [
...


Cheers,


Alexander Sicular
@siculars
http://sicuars.posterous.com


On Friday, May 27, 2011 at 12:55 AM, Antonio Rohman Fernandez wrote:

> hello all,
> 
> http://IP:8098/riak?buckets=true [ will show all available buckets on Riak ]
> http://IP:8098/riak/bucketname?keys=true&props=false [ will show all 
> available keys on a bucket ]
> 
> to me, this proves a very big security risk, as if somebody discovers your 
> Riak server's IP, is very easy to read all the information from it, even if 
> you try to obfuscate the buckets/keys... everything is highly readable.
> there is any way to disable those options? like {riak_kv_stat, false} hides 
> the /stats page
> 
> thanks
> 
> Rohman 
> 
> 
> Antonio Rohman Fernandez
> CEO, Founder & Lead Engineer
> roh...@mahalostudio.com (mailto:roh...@mahalostudio.com)
> 
> Projects
> MaruBatsu.es (http://marubatsu.es)
> PupCloud.com (http://pupcloud.com)
> Wedding Album (http://wedding.mahalostudio.com) 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread Antonio Rohman Fernandez
"In our case, the only nodes that are allowed to hit the Riak cluster are those of our applications"... what if your app is more complex than that and you have thousands of servers all around the world ( different datacenters, different networks ) with crawlers, scanners, blackboxes, etc... all communicating with Riak and adding/removing new scanners/crawlers/blackboxes/etc... every now and then... quite troublesome to set up and maintain a firewall for that."It is not recommended that you deploy Riak on the public internet"... what if apart from webservers with a web-app i want to build iPhone/iPad/Android apps that access Riak directly? one thing i love from Riak is its RESTfull architecture, but if i have to build some API somewhere for the mobile apps to interact with Riak... well... the 'cloud' paradigm just vanished for me... also, i would have a single point of failure on the API implementation.
any other suggestions?
Rohman
On Fri, 27 May 2011 01:20:00 -0400, Alexander Sicular  wrote:


Hi Rohman,

It is not recommended that you deploy Riak on the public internet. Keep all access private and then implement iptables on each individual node securing access to upstream clients.

Ports to keep in mind - 

http(s) port (8098)
protocol buffers port (8099)
epmd (4369)
forcing the range of ports erlang uses to communicate amongst other erlang nodes.

The latter is not part of the default configuration but I think it should be. At least commented out in app.config. 


Put it right at the top of the config array above the riak_core directives like so:




[


%% limit dynamic ports erlang uses to communicate
%% pick some range that works in your environment 
%{kernel, [
%   {inet_dist_listen_min, 21000},   
%   {inet_dist_listen_max, 22000}
%]},


 %% Riak Core config
 {riak_core, [
...


Cheers,
 

Alexander Sicular
@siculars
http://sicuars.posterous.com

On Friday, May 27, 2011 at 12:55 AM, Antonio Rohman Fernandez wrote:



hello all,http://IP:8098/riak?buckets=true [ will show all available buckets on Riak ]http://IP:8098/riak/bucketname?keys=true&props=false [ will show all available keys on a bucket ]to me, this proves a very big security risk, as if somebody discovers your Riak server's IP, is very easy to read all the information from it, even if you try to obfuscate the buckets/keys... everything is highly readable.there is any way to disable those options? like {riak_kv_stat, false} hides the /stats pagethanksRohman

 Antonio Rohman FernandezCEO, Founder & Lead Engineerroh...@mahalostudio.com ProjectsMaruBatsu.esPupCloud.comWedding Album


___riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com






-- 
 Antonio Rohman FernandezCEO, Founder & Lead Engineerroh...@mahalostudio.com ProjectsMaruBatsu.esPupCloud.comWedding Album
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread Matt Ranney
On Thu, May 26, 2011 at 8:10 PM, Antonio Rohman Fernandez <
roh...@mahalostudio.com> wrote:

> what if apart from webservers with a web-app i want to build
> iPhone/iPad/Android apps that access Riak directly?


Unfortunately, Riak just isn't designed for that.  You might be able to work
around it somehow, but you'll be going against the grain.  It's probably
best if you get some other set of servers that both handle the external HTTP
traffic and talk to Riak on the clients' behalf.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread Russell Brown

On 27 May 2011, at 07:10, Antonio Rohman Fernandez wrote:

> "In our case, the only nodes that are allowed to hit the Riak cluster are 
> those of our applications"... what if your app is more complex than that and 
> you have thousands of servers all around the world ( different datacenters, 
> different networks ) with crawlers, scanners, blackboxes, etc... all 
> communicating with Riak and adding/removing new 
> scanners/crawlers/blackboxes/etc... every now and then... quite troublesome 
> to set up and maintain a firewall for that.
> 
> "It is not recommended that you deploy Riak on the public internet"... what 
> if apart from webservers with a web-app i want to build iPhone/iPad/Android 
> apps that access Riak directly? one thing i love from Riak is its RESTfull 
> architecture, but if i have to build some API somewhere for the mobile apps 
> to interact with Riak... well... the 'cloud' paradigm just vanished for me... 
> also, i would have a single point of failure on the API implementation.
> 
> any other suggestions?
> 
> 
Something linke nginx set up as a reverse proxy with re-write rules/filters for 
urls you consider a security risk? Instance per riak instance, riak only 
available on localhost and nginx facing the outside world?
> Rohman
> 
> On Fri, 27 May 2011 01:20:00 -0400, Alexander Sicular  
> wrote:
> 
>> Hi Rohman,
>> 
>> It is not recommended that you deploy Riak on the public internet. Keep all 
>> access private and then implement iptables on each individual node securing 
>> access to upstream clients.
>> 
>> Ports to keep in mind - 
>> 
>> http(s) port (8098)
>> protocol buffers port (8099)
>> epmd (4369)
>> forcing the range of ports erlang uses to communicate amongst other erlang 
>> nodes.
>> 
>> The latter is not part of the default configuration but I think it should 
>> be. At least commented out in app.config.
>> 
>> Put it right at the top of the config array above the riak_core directives 
>> like so:
>> 
>> [
>> %% limit dynamic ports erlang uses to communicate
>> %% pick some range that works in your environment 
>> %{kernel, [
>> %   {inet_dist_listen_min, 21000},   
>> %   {inet_dist_listen_max, 22000}
>> %]},
>>  %% Riak Core config
>>  {riak_core, [
>> ...
>> Cheers,
>>  
>> Alexander Sicular
>> @siculars
>> http://sicuars.posterous.com
>> 
>> On Friday, May 27, 2011 at 12:55 AM, Antonio Rohman Fernandez wrote:
>> 
>> hello all,
>> 
>> http://IP:8098/riak?buckets=true [ will show all available buckets on Riak ]
>> http://IP:8098/riak/bucketname?keys=true&props=false [ will show all 
>> available keys on a bucket ]
>> 
>> to me, this proves a very big security risk, as if somebody discovers your 
>> Riak server's IP, is very easy to read all the information from it, even if 
>> you try to obfuscate the buckets/keys... everything is highly readable.
>> there is any way to disable those options? like {riak_kv_stat, false} hides 
>> the /stats page
>> 
>> thanks
>> 
>> Rohman
>> 
>> 
>> Antonio Rohman Fernandez
>> CEO, Founder & Lead Engineer
>> roh...@mahalostudio.com  Projects
>> MaruBatsu.es
>> PupCloud.com
>> Wedding Album
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> -- 
> 
>   Antonio Rohman Fernandez
> CEO, Founder & Lead Engineer
> roh...@mahalostudio.com   Projects
> MaruBatsu.es
> PupCloud.com
> Wedding Album
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: hidding buckets and keys

2011-05-26 Thread Antonio Rohman Fernandez
"riak only available on localhost and nginx facing the outside world"... that sounds like something worth trying! thanks.even i still think it could be great to have some options to enable/disable those "?buckets=true" and "?keys=true"Rohman
On Fri, 27 May 2011 07:40:45 +0100, Russell Brown  wrote:


On 27 May 2011, at 07:10, Antonio Rohman Fernandez wrote:


"In our case, the only nodes that are allowed to hit the Riak cluster are those of our applications"... what if your app is more complex than that and you have thousands of servers all around the world ( different datacenters, different networks ) with crawlers, scanners, blackboxes, etc... all communicating with Riak and adding/removing new scanners/crawlers/blackboxes/etc... every now and then... quite troublesome to set up and maintain a firewall for that."It is not recommended that you deploy Riak on the public internet"... what if apart from webservers with a web-app i want to build iPhone/iPad/Android apps that access Riak directly? one thing i love from Riak is its RESTfull architecture, but if i have to build some API somewhere for the mobile apps to interact with Riak... well... the 'cloud' paradigm just vanished for me... also, i would have a single point of failure on the API implementation.
any other suggestions?


Something linke nginx set up as a reverse proxy with re-write rules/filters for urls you consider a security risk? Instance per riak instance, riak only available on localhost and nginx facing the outside world?

Rohman
On Fri, 27 May 2011 01:20:00 -0400, Alexander Sicular  wrote:


Hi Rohman,

It is not recommended that you deploy Riak on the public internet. Keep all access private and then implement iptables on each individual node securing access to upstream clients.

Ports to keep in mind - 

http(s) port (8098)
protocol buffers port (8099)
epmd (4369)
forcing the range of ports erlang uses to communicate amongst other erlang nodes.

The latter is not part of the default configuration but I think it should be. At least commented out in app.config. 


Put it right at the top of the config array above the riak_core directives like so:




[

%% limit dynamic ports erlang uses to communicate
%% pick some range that works in your environment 
%{kernel, [
%   {inet_dist_listen_min, 21000},   
%   {inet_dist_listen_max, 22000}
%]},

 %% Riak Core config
 {riak_core, [
...
Cheers,
 

Alexander Sicular
@siculars
http://sicuars.posterous.com

On Friday, May 27, 2011 at 12:55 AM, Antonio Rohman Fernandez wrote:



hello all,http://IP:8098/riak?buckets=true [ will show all available buckets on Riak ]http://IP:8098/riak/bucketname?keys=true&props=false [ will show all available keys on a bucket ]to me, this proves a very big security risk, as if somebody discovers your Riak server's IP, is very easy to read all the information from it, even if you try to obfuscate the buckets/keys... everything is highly readable.there is any way to disable those options? like {riak_kv_stat, false} hides the /stats pagethanksRohman

 Antonio Rohman FernandezCEO, Founder & Lead Engineerroh...@mahalostudio.com ProjectsMaruBatsu.esPupCloud.comWedding Album


___riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





-- 
 Antonio Rohman FernandezCEO, Founder & Lead Engineerroh...@mahalostudio.com ProjectsMaruBatsu.esPupCloud.comWedding Album

___riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 
 Antonio Rohman FernandezCEO, Founder & Lead Engineerroh...@mahalostudio.com ProjectsMaruBatsu.esPupCloud.comWedding Album
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com