Just a thought (and probably wrong) but couldn't you create an index key inside your bucket and depending on the language append a list of the key name sans index_key and just query that. It would do the double write and you could verify that the new key is inserted.
Again probably bad but it's at least $0.02. Christopher Villalobos On Jul 20, 2010, at 6:00 PM, <riak-users-requ...@lists.basho.com> wrote: > Send riak-users mailing list submissions to > riak-users@lists.basho.com > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > or, via email, send a message with subject or body 'help' to > riak-users-requ...@lists.basho.com > > You can reach the person managing the list at > riak-users-ow...@lists.basho.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of riak-users digest..." > > > Today's Topics: > > 1. Re: Expected vs Actual Bucket Behavior (Justin Sheehy) > 2. [ANN] Basho Riak 0.12.0 (Rusty Klophaus) > 3. Re: Expected vs Actual Bucket Behavior (Eric Filson) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 20 Jul 2010 15:02:55 -0400 > From: Justin Sheehy <jus...@basho.com> > To: Eric Filson <efil...@gmail.com> > Cc: riak-users@lists.basho.com > Subject: Re: Expected vs Actual Bucket Behavior > Message-ID: > <aanlktilucsalrkdxxormbowzsyaxf55itafsway22...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, Eric! Thanks for your thoughts. > > On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson <efil...@gmail.com> wrote: > >> I would think that this requirement, >> retrieving all objects in a bucket, to be a _very_ common >> place?occurrence?for modern web development and perhaps (depending on >> requirements) _the_ most common function aside from retrieving a single k/v >> pair. > > I tend to see people that mostly try to write applications that don't > select everything from a whole bucket/table/whatever as a very > frequent occurrence, but different people have different requirements. > Certainly, it is sometimes unavoidable. > >> In my mind, this seems to leave the only advantage to buckets in this >> application to be namespacing... While certainly important, I'm fuzzy on >> what the downside would be to allowing buckets to exist as a separate >> partition/pseudo-table/etc... so that?retrieving?all objects in a bucket >> would not need to read all objects in the entire system > > The namespacing aspect is a huge advantage for many people. Besides > the obvious way in which that allows people to avoid collisions, it is > a powerful tool for data modeling. For example, sets of 1-to-1 > relationships can be very nicely represented as something like > "bucket1/keyA, bucket2/keyA, bucket3/keyA", which allows related items > to be fetched without any intermediate queries at all. > > One of the things that many users have become happily used to is that > buckets in Riak are generally "free"; they come into existence on > demand, and you can use as many of them as you want in the above or > any other fashion. This is in essence what conflicts with your > desire. Making buckets more fundamentally isolated from each other > would be difficult without incurring some incremental cost per bucket. > >> I might recommend a hybrid >> solution (based in my limited knowledge of Riak)... What about allowing a >> bucket property named something like "key_index" that points to a key >> containing a value of "keys in bucket". ?Then, when calling GET >> /riak/bucket, Riak would use the key_index to immediately reduce its result >> set before applying m/r funcs. ?While I understand this is essentially what >> a developer would do, it would certainly alleviate some code requirements >> (application side) as well as make the behavior of retrieving a bucket's >> contents more "expected" and efficient. > > A much earlier incarnation of Riak actually stored bucket keylists > explicitly in a fashion somewhat like what you describe. We removed > this as one of our biggest goals is predictable and understandable > behavior in a distributed systems sense, and a model like this one > turns each write operation into at least two operations. This isn't > just a performance issue, but also adds complexity. For instance, it > is not immediately obvious what should be returned to the client if a > data item write succeeds, but the read/write of the index fails? > > Most people using distributed data systems (including but not limited > to Riak) do explicit data modeling, using things like key identity as > above, or objects that contain links to each other (Riak has great > support for this) or other data modeling means to plan out their > expected queries in advance. > >> Anyway, information is pretty limited on riak right now, seeing as how it's >> so new, but talk in my development circles is very positive and lively. > > Please do let us know any aspects of information on Riak that you > think are missing. We think that between the wiki, the web site, and > various other materials, the information is pretty good. Riak's been > open source for about a year, and in use longer than that; while there > are many things much older than Riak, we don't see relative youth as a > reason not to do things right. > > Thanks again for your thoughts, and I hope that this helps with your > understanding. > > -Justin > > > > ------------------------------ > > Message: 2 > Date: Tue, 20 Jul 2010 15:05:56 -0400 > From: Rusty Klophaus <ru...@basho.com> > To: riak-users <riak-users@lists.basho.com> > Subject: [ANN] Basho Riak 0.12.0 > Message-ID: > <aanlktikhtiyhqvgnglj94yahkw79mqi1adkjjdoml...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, Riak users. We are excited to announce the release of Riak version > 0.12! > > Pre-built installations and source tarballs are available at: > http://downloads.basho.com/ > > Release notes are at (also copied below): > http://downloads.basho.com/riak/riak-0.12/riak-0.12.0.txt > > Cheers, > The Basho Riak Team > > ------------------------- > Riak 0.12.0 Release Notes > ------------------------- > > Riak now uses a new and improved mechanism for determining whether a > node is fully online and ready to participate in Riak operations. This > is especially important in failure recovery situations, as it allows > the storage backend to complete a full integrity check and repair > process. (134) > > Applications can now use the keywords "one", "quorum" (or "default"), > and "all" in place of number values to set R, W, and DW quorum settings. > This allows developers to specify intended consistency levels more > clearly. (211, 276, 277, 322) > > The multi backend has been fixed so bitcask can be used with the > other backends (274). If innostore is installed it must be upgraded to 1.0.1 > if it will be used with the multi backend. > > Enhancements > ------------ > 82 - HTTP API now returns 400 when quorum parameters exceed N-val. > 83 - Default quorum parameters are now configurable in HTTP and Protobuf > APIs. > 97 - Riak now calls a BackendModule:stop/1 function, allowing cleanup > during shutdown. > 190 - HTTP API now returns 503 when Riak operation times out. > 192 - HTTP API no longer list keys on a bucket by default. > 283 - HTTP API now returns 404 when an object is missing, regardless > of accept header. (202) > 216 - The "/stats" page now includes read-repair stats. > 219 - A node now verifies that the ring_creation_size matches before > joining a cluster. > 230 - Upgrade to latest version of Mochiweb. > 237 - Added a 'mapByFields' built-in Map/Reduce function. > 246 - Improved error reporting in post-commit hooks. > 251 - More descriptive error message on malformed link walking operations. > 256 - The /stats endpoint now shows Riak version number. > 259 - Improve python client packaging. Publish on PyPI. > 267 - Updated bucket defaults to improve replica distribution across > physical nodes. > 274 - Improvements to storage backend interface layer. > 365 - Use updated "rebar eunit" task for running tests. > > Bugs Fixed > ---------- > 26 - The 'devrel' target now builds on CentOS. > 27 - Fixed 'riak-admin' problem on some architectures, including Solaris. > 138 - Fixed platform specific problems in Riak 'init.d' script. > 205 - Fixed Bitcask errors on 32-bit Erlang. (331, 344) > 229 - Fixed 'riak stop' error on Mac OSX Snow Leopard 10.6.3. > 240 - Python client now properly escapes "/" in Buckets and Keys. > 253 - Correctly pass missing object (not_found) results between > Map/Reduce phases. > 274 - Correctly forward 'info' messages from multi_backend to child backends. > 278 - Make Riak backup work correctly in all cases when objects are > deleted while backup is in progress. > 280 - Fixed corner cases causing timestamp collision in Bitcask. > 281 - Fixed premature tombstone collection in Bitcask. > 301 - Fixed chunked mapreduce results to use correct line breaks (\r\n). > 305 - Fixed possible race condition between get and Bitcask merge. > 382 - Update Map/Reduce to honor timeout setting. > 361 - Cleaned up Dialyzer warnings. (373, 374, 376, 381, 389) > 382 - Update Map/Reduce to honor timeout setting. > 402 - Make Bitcask data and hint files more resistant to corruption. > > Riak has been updated with the necessary changes to compile > on Erlang R14A, but has not been thoroughly tested on R14A. > Please continue to run Riak on R13B04 in production. (263, 264, 269) > > All bug and issue numbers reference https://issues.basho.com. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100720/728190c5/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Tue, 20 Jul 2010 18:00:14 -0400 > From: Eric Filson <efil...@gmail.com> > To: Justin Sheehy <jus...@basho.com> > Cc: riak-users@lists.basho.com > Subject: Re: Expected vs Actual Bucket Behavior > Message-ID: > <aanlktikyoknkptl4ccflkkdros8j3w4_kggbab0tb...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > On Tue, Jul 20, 2010 at 3:02 PM, Justin Sheehy <jus...@basho.com> wrote: > >> Hi, Eric! Thanks for your thoughts. >> >> On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson <efil...@gmail.com> wrote: >> >>> I would think that this requirement, >>> retrieving all objects in a bucket, to be a _very_ common >>> place occurrence for modern web development and perhaps (depending on >>> requirements) _the_ most common function aside from retrieving a single >> k/v >>> pair. >> >> I tend to see people that mostly try to write applications that don't >> select everything from a whole bucket/table/whatever as a very >> frequent occurrence, but different people have different requirements. >> Certainly, it is sometimes unavoidable. >> > > Indeed, in my case it is :( > > >> >>> In my mind, this seems to leave the only advantage to buckets in this >>> application to be namespacing... While certainly important, I'm fuzzy on >>> what the downside would be to allowing buckets to exist as a separate >>> partition/pseudo-table/etc... so that retrieving all objects in a bucket >>> would not need to read all objects in the entire system >> >> The namespacing aspect is a huge advantage for many people. Besides >> the obvious way in which that allows people to avoid collisions, it is >> a powerful tool for data modeling. For example, sets of 1-to-1 >> relationships can be very nicely represented as something like >> "bucket1/keyA, bucket2/keyA, bucket3/keyA", which allows related items >> to be fetched without any intermediate queries at all. >> > > I agree however, the same thing can be accomplished by prefixing your keys > with a "namespace"... > > bucket_1_keyA, bucket_2_keyA, bucket_3_keyA > > Obviously, buckets in Riak have additional functionality and allow for some > more complex but easier to use m/r functions across multiple buckets, > etc... > > >> >> One of the things that many users have become happily used to is that >> buckets in Riak are generally "free"; they come into existence on >> demand, and you can use as many of them as you want in the above or >> any other fashion. This is in essence what conflicts with your >> desire. Making buckets more fundamentally isolated from each other >> would be difficult without incurring some incremental cost per bucket. >> > > For me, I am more than willing to add a small amount of overhead to the > storage engine for increased functionality and reduced overhead on the > application layer. Again this is obviously application specific and I'm not > saying it should all be converted over for all buckets exiting in their own > space for every implementation but certainly a different storage engine or > configuration option to allow this level/type of access would be nice :) > > >>> I might recommend a hybrid >>> solution (based in my limited knowledge of Riak)... What about allowing a >>> bucket property named something like "key_index" that points to a key >>> containing a value of "keys in bucket". Then, when calling GET >>> /riak/bucket, Riak would use the key_index to immediately reduce its >> result >>> set before applying m/r funcs. While I understand this is essentially >> what >>> a developer would do, it would certainly alleviate some code requirements >>> (application side) as well as make the behavior of retrieving a bucket's >>> contents more "expected" and efficient. >> >> A much earlier incarnation of Riak actually stored bucket keylists >> explicitly in a fashion somewhat like what you describe. We removed >> this as one of our biggest goals is predictable and understandable >> behavior in a distributed systems sense, and a model like this one >> turns each write operation into at least two operations. This isn't >> just a performance issue, but also adds complexity. For instance, it >> is not immediately obvious what should be returned to the client if a >> data item write succeeds, but the read/write of the index fails? >> > > Haha, these are the exact reasons I would cite as a developer for using a > similar method on Riak's side... without the option of auto bucket indexing > it effectively places this double write into the application side where it > requires more cycles and more data across the wire. Instead of doing a > single write, from the application side, and allowing Riak to handle this, > you have to GET index_key, UPDATE index_key, ADD new_key... So rather than > having a single transaction with Riak, you have to have three transactions > with Riak + Application functionality. Inherently, this adds another level > of complexity into the application code base for something that could be > done more efficiently by the DB engine itself. > > I would think a separate error number and message would suffice as a return > error, obviously though, this would require developers being made aware so > they can code for the exception. > > Also, this would be optional, if the index_key wasn't set for the bucket > then this setup wouldn't be used. This would at least make the system more > flexible to the application requirements and developer preferences. > > >> Most people using distributed data systems (including but not limited >> to Riak) do explicit data modeling, using things like key identity as >> above, or objects that contain links to each other (Riak has great >> support for this) or other data modeling means to plan out their >> expected queries in advance. >> >>> Anyway, information is pretty limited on riak right now, seeing as how >> it's >>> so new, but talk in my development circles is very positive and lively. >> >> Please do let us know any aspects of information on Riak that you >> think are missing. We think that between the wiki, the web site, and >> various other materials, the information is pretty good. Riak's been >> open source for about a year, and in use longer than that; while there >> are many things much older than Riak, we don't see relative youth as a >> reason not to do things right. >> >> Thanks again for your thoughts, and I hope that this helps with your >> understanding. > > > Some very valuable information, for me, would be seeing a breakdown of how > Riak scales out... > > Something like showing how many keys in how many buckets takes how long with > how many nodes... (extended by, now with 2 more machines, now with more > complex m/r funcs, now with twice as many keys, etc...) I know this largely > depends on whatever map/reduce functions are being run however even a simple > example would be nice to see. As it is I have no idea how many queries per > second of what type I can run with how many active nodes? Again, I realize > this is something that needs to be benchmarked for any sort of accuracy but > I'm speaking more of targeting developers, like myself, who are looking into > this as a newer technology that may work for them. It is a very large > commitment of time and resources to design and implement something then > benchmark it in order to obtain the "if it will work for this application > efficiently" answer. Having some baseline stats from which to start may > prompt more developers to explore Riak as a storage solution. > > And one more thanks for hearing me out and your feedback. I'd also like to > reiterate that I'm coming from a limited nosql background... however I feel > that's the case with the majority of developers out there today. My > recommendations for options are based on the real world application design > challenges I've personally been presented with over my career and that I > feel may be common to many other developers as well. Obviously, even adding > a single option such that I've mentioned is a massive undertaking on Basho's > part but they are definitely pieces of functionality that would make me say, > "done, Riak it is". Rather than... is there something else which would > better suit my needs... and when vying for adoption rate that's a major > factor :) > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20100720/e96e0b29/attachment.html> > > ------------------------------ > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > End of riak-users Digest, Vol 12, Issue 24 > ****************************************** _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com