DELETE not really delete key

2010-08-04 Thread Damien Hardy
Hello,

I suppose I miss something important as a newbie at riak (and nosql
architecture in general), just point me where :)

I installed riak cluster 0.12 on 3 servers with no difficulties and inserted
some keys as well.

When I want to delete a key, it seams to work well too (got a 404 error when
triing to get it directly, fine)  BUT ...

The key is still there enabled in mapreduce functions (with a tag
"X-Riak-Deleted":"true" added)

Am I obliged to filter on key having "X-Riak-Deleted":"true" in metadata on
each map functions I want to use ?

Thank you for help.

Cdt,

-- 
Damien
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Use of fallback nodes for get requests?

2010-08-04 Thread Nico Meyer
Hi Justin,

I think we are coming from two different directions here, leading to
some confusion. You seem to treat a get for a non existing key as an
error, in which case all your points are valid of course. I suspected
that this is the reason for the current design choice, but I didn't see
it stated anywhere explicitly. And also notfound seems to be handled
differently from other types of errors, at least in the way it is
signalled to the client, so I didn't immediately think of it as an error
case.
On the other hand there are many applications where asking for a key
that has never been put is perfectly valid, an not_found is indeed the
right answer in that case. Our application is an example of that. The
key is given (it is a unique cookie ID), and we need to check if we saw
a specific ID before in a certain context and if so get some data that
was associated with the ID back then. More often than not this is not
the case, so notfound is the expected answer.

If you read my original mail again with that use case in mind it might
become clearer what my problem with the current design is.

Having to fulfil the precondition that we only do gets for keys we know
to have been put before would require another datastore for that
purpose, which seems kind of akward and unnecessary, since riak has all
the required data to handle our use case.

Please let me know if I need to further clarify my thoughts about this.
English is not my first language and its hard enough to reason about
these things in German and face-to-face :-).

Cheers,
Nico

Am Montag, den 02.08.2010, 22:29 -0400 schrieb Justin Sheehy: 
> Hi, Nico.
> 
> On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer  wrote:
> 
> > What I mean is, if I do a get request for a key with R=N, and one of the
> > first N nodes in the preflist is down the request will still succeed.
> > Why is that? Doesn't that undermine the purpose of seting R to a high
> > number (specifically setting it to N)? That way a request might succeed
> > even if all primary nodes responsible for the key are unavailable.
> 
> You are correct, and this is intentional.  There is nothing in the R
> or W settings that is intended to indicate anything at all about
> "primary" nodes.  It is rather simply the number of successful
> responses that the client wishes to wait for, and thus the degree of
> quorum sought before a client reply is sent.  Using fallback nodes to
> satisfy reads is a natural result of using fallback nodes to satisfy
> writes.
> 




> If all primary nodes responsible for a key are unavailable, but enough
> of the fallback nodes for that key have received a value for that key
> since they went unavailable (through a fallback write) then a request
> to get that key might succeed.  I am not sure why you see this as a
> bad thing.
> 
> (It will only succeed if R nodes actually provide a successful result,
> not just if they are available.)
> 
> > On a similar note, why is the riak_kv_get_fsm waiting for at least
> > (N/2)+1 responses, if there are only not_found responses, effectively
> > ignoring a smaller R value of the request if the key does not exists?
> 
> This is a compromise to deal with real situations that can occur where
> a single node might be taking a very long time to reply, and a value
> has never been stored for a given key.  Without either this basic
> quorum default for notfounds or alternately considering a notfound as
> success and thus only waiting for R of them, that situation would mean
> that an R=1 request would take much longer to complete than an R=2
> request (due to waiting for the slow node) which is confusing to most
> users.  Note that since it applies to notfounds, this tends to only
> come into play for items that have never been successfully stored with
> at least a basic quorum -- things that really are not present, that
> is.
> 
> > My guess was, that this also has to do with the use of fallback nodes:
> > Since the partition will usually be very small on the fallback/handoff
> > node, it is likely to be the first to answer. So to avoid returning
> > false not_found responses, a basic quorum is required.
> > Am I on the right track here?
> 
> It doesn't have anything to do with fallback nodes explicitly.  It is
> for situations where a node is under any condition that will slow it
> down significantly.  In such situations, there is little to be gained
> in waiting for all N replies if (N/2)+1 have already declared
> notfound.
> 
> > The problem is, this is imposed even for the case that all nodes are up.
> > If one requires very low latency or very high availability (that's why
> > one uses a small R value in the first place) and does a lot of gets for
> > non existent keys, riak silently screws you over by raising R for those
> > keys.
> 
> It seems that there is something here worth clarifying.  If you are
> issuing requests with W+R<=N, and some reads following writes return
> notfound during an interval immediately following initial storage
> 

Re: Use of fallback nodes for get requests?

2010-08-04 Thread Nico Meyer
I just saw https://issues.basho.com/show_bug.cgi?id=275 , which would
actually be just what I need.

Am Mittwoch, den 04.08.2010, 13:58 +0200 schrieb Nico Meyer:
> Hi Justin,
> 
> I think we are coming from two different directions here, leading to
> some confusion. You seem to treat a get for a non existing key as an
> error, in which case all your points are valid of course. I suspected
> that this is the reason for the current design choice, but I didn't see
> it stated anywhere explicitly. And also notfound seems to be handled
> differently from other types of errors, at least in the way it is
> signalled to the client, so I didn't immediately think of it as an error
> case.
> On the other hand there are many applications where asking for a key
> that has never been put is perfectly valid, an not_found is indeed the
> right answer in that case. Our application is an example of that. The
> key is given (it is a unique cookie ID), and we need to check if we saw
> a specific ID before in a certain context and if so get some data that
> was associated with the ID back then. More often than not this is not
> the case, so notfound is the expected answer.
> 
> If you read my original mail again with that use case in mind it might
> become clearer what my problem with the current design is.
> 
> Having to fulfil the precondition that we only do gets for keys we know
> to have been put before would require another datastore for that
> purpose, which seems kind of akward and unnecessary, since riak has all
> the required data to handle our use case.
> 
> Please let me know if I need to further clarify my thoughts about this.
> English is not my first language and its hard enough to reason about
> these things in German and face-to-face :-).
> 
> Cheers,
> Nico
> 
> Am Montag, den 02.08.2010, 22:29 -0400 schrieb Justin Sheehy: 
> > Hi, Nico.
> > 
> > On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer  wrote:
> > 
> > > What I mean is, if I do a get request for a key with R=N, and one of the
> > > first N nodes in the preflist is down the request will still succeed.
> > > Why is that? Doesn't that undermine the purpose of seting R to a high
> > > number (specifically setting it to N)? That way a request might succeed
> > > even if all primary nodes responsible for the key are unavailable.
> > 
> > You are correct, and this is intentional.  There is nothing in the R
> > or W settings that is intended to indicate anything at all about
> > "primary" nodes.  It is rather simply the number of successful
> > responses that the client wishes to wait for, and thus the degree of
> > quorum sought before a client reply is sent.  Using fallback nodes to
> > satisfy reads is a natural result of using fallback nodes to satisfy
> > writes.
> > 
> 
> 
> 
> 
> > If all primary nodes responsible for a key are unavailable, but enough
> > of the fallback nodes for that key have received a value for that key
> > since they went unavailable (through a fallback write) then a request
> > to get that key might succeed.  I am not sure why you see this as a
> > bad thing.
> > 
> > (It will only succeed if R nodes actually provide a successful result,
> > not just if they are available.)
> > 
> > > On a similar note, why is the riak_kv_get_fsm waiting for at least
> > > (N/2)+1 responses, if there are only not_found responses, effectively
> > > ignoring a smaller R value of the request if the key does not exists?
> > 
> > This is a compromise to deal with real situations that can occur where
> > a single node might be taking a very long time to reply, and a value
> > has never been stored for a given key.  Without either this basic
> > quorum default for notfounds or alternately considering a notfound as
> > success and thus only waiting for R of them, that situation would mean
> > that an R=1 request would take much longer to complete than an R=2
> > request (due to waiting for the slow node) which is confusing to most
> > users.  Note that since it applies to notfounds, this tends to only
> > come into play for items that have never been successfully stored with
> > at least a basic quorum -- things that really are not present, that
> > is.
> > 
> > > My guess was, that this also has to do with the use of fallback nodes:
> > > Since the partition will usually be very small on the fallback/handoff
> > > node, it is likely to be the first to answer. So to avoid returning
> > > false not_found responses, a basic quorum is required.
> > > Am I on the right track here?
> > 
> > It doesn't have anything to do with fallback nodes explicitly.  It is
> > for situations where a node is under any condition that will slow it
> > down significantly.  In such situations, there is little to be gained
> > in waiting for all N replies if (N/2)+1 have already declared
> > notfound.
> > 
> > > The problem is, this is imposed even for the case that all nodes are up.
> > > If one requires very low latency or very high availability (that's why
> > > one uses a small R

Re: Structured data

2010-08-04 Thread Aníbal Rojas
I was going to suggest you to take a look at Redis, but you are already
aware of it. Redis is a completely different beast than Riak, with a
completely different roadmap for distributability, complex because of the
commands operating in data structures.

You could look at Voldemort with its pluggable types, maybe it works for
you.

In the end for any non trivial/toy problem is difficult not to end with at
least 2 or 3 persistence solutions working together, there ain't silver
bullet.

--
Aníbal

On Aug 3, 2010 9:17 AM, "James Sadler"  wrote:

Hi guys,

Is there any plan for supporting more than opaque blobs as values in
the future?  For example, lists, sets, hashes etc, including
server-side support for operating on those data-types?

In the meantime, I am curious about the feasibility of putting
riak_core in front of Redis, and supporting a richer API in the layers
above (which I suspect would need to be heavily modified to support
additional value manipulation functionality).

Also, is anyone aware of anyone attempting such a project?

Cheers,

--
James

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for 8/2 - 8/3

2010-08-04 Thread Mark Phillips
Afternoon, Evening, Morning to all,

We have an awesome Recap for today: Blog posts, a node.js client,
interviews, Gists, and some great questions from #riak.

Enjoy -

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups

-

Riak Recap for 8/2 - 8/3

1) Dzone put up a short piece talking about Kevin Smith's
"Introduction to Rick Core" blog post.

Post here ---> http://architects.dzone.com/articles/basho-packages-its-riak-core

2)  Bruno Michel, who has written some great stuff about Riak releases
in the past, put out a short piece for Linuxfr.org called, "Petites
brèves : NoSQL : Neo4J, Riak, Kyoto Cabinet et Graylog2." Merci,
Bruno!

Post here --> http://bit.ly/aH0EM9

3) technoweenie has been dropping some hints over the last few days in
#riak that that his PB client for Node.js was almost ready for public
consumption. Yesterday, in the early morning hours, he delivered it
(with an accompanying blog). Thanks, Rick!

Post here ---> http://techno-weenie.net/2010/8/3/protobuf-for-node

4) @jebui has been doing some great stuff with Riak. In addition to
contributing patches, he's also using it in production at inagist.com.
And now we have some details how he is using it. This is an awesome
post! Thanks for sharing, Jebu!

Read this ---> http://post.ly/q0wH

5) Basho CTO and Riak hacker Justin Sheehy sat down with Sadek Drobi
when he was at Erlang Factory some months ago to do an interview for
InfoQ. It's a great interview, so if you like Riak, NoSQL, and Open
source, check it out.

Interview here ---> http://bit.ly/baNz9h

6) seancribbs and vicmargar had a chat in #riak that started with,
"what is the recommended way to connect to riak from erlang? "

Gist here ---> http://gist.github.com/508657

7) Q ---  When you have a cluster, should your client have a list of
host names and ports so it doesnt constantly hit the same node? (from
technoweenie via #riak)

A --- Having a list of available Riak nodes in your client is a
viable option to distribute, but using a TCP loadbalancer with a
"least connected" option is more robust.

8) Q --- I am trying to write some erlang for postcommit indexing. I
compiled and added the module and function to the running riak, added
the module/function to the bucket properties and then ran some test
queries. however, when I run it I get this error:

** Reason for termination = **
{{case_clause,{struct,[{<<"mod">>,<<"post_commit_index">>},
<<"fun">>,<<"index_value">>}]}}, [{riak_kv_put_fsm,invoke_hook,3},
{riak_kv_put_fsm,waiting_vnode_dw,2}, {gen_fsm,handle_msg,7},
{proc_lib,init_p_do_apply,3}]}

I'm not sure where else too look for further indications of what is
wrong, anyone on that can assist a newb?

   A --- It looks like you defined mod, fun instead of a list of mod,
funs. If you change the bucket property postcommit to a list of
mod,fun objects rather than setting it to a mod.fun object, you should
be good to go.

9) Q --- How can I watch a riak error log? (from technoweenie via #riak)

A --- "riak console" is one option. You can also tail /log/erlang.log.#

10) siculars, justinsheehy and seancribbs kicked around some ideas in
#riak around how bitcask might be used to list keys more-efficiently.

Gist here ---> http://gist.github.com/508693

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com