Alexander, Great post, lots of information, thanks for sharing this.
On key size, I agree it is important to understand that Riak will not retrieve partial keys. I think I read in the list that they may eventually allow partial retrieval of only header information on keys. I think this would be an excellent improvement. My time capsule buckets idea was not to add links at the bucket level, but to create time capsule objects into different buckets, for example a "day" bucket which would contain time capsule objects for every second. So a day bucket would contain 86400 time capsule objects which would either have links pointing to all realtime data objects that were gathered within that second OR contain the list of keys to these realtime data objects. I am thinking per day buckets and per second time capsule objects because this looks like a reasonable way to partition the data without creating too many buckets and not too many objects within bucket if listing keys is required to launch a mapreduce job over the complete bucket. Hourly buckets also sound good. I like your external index + Redis idea and to <<create an m/r from your own key list based off of a key index you maintain in Redis with periodic flush to it's own Riak index bucket with key names based on your own time frequency>>. The question is, how do we efficiently integrate this with the m/r framework? Will retrieving the list of keys in the application, by reading a bunch of these "index" keys (or similarly my "time capsule" keys) and passing them as the input list of a m/r job be efficient? What if we are dealing with very large key list, think hundred of thousands or millions? [... lunch ...] (hey, I'm on the east cost!) Ah! yes, of course. Sorry, I'm answering my own question here, but I just realized that having index buckets with objects that contains a key list pointing to the realtime data objects would absolutely work with a multi phase mapreduce where the first phase would select the initial set of keys by doing a key list on a whole bucket or by using the link query, then an initial map phase would simply read all the keys from the "index" or "time capsule" objects and pass it as a key list to a subsequent map phase which would actually work on the data. This initial map phase would be a dereferencing phase. Make sense? Riak+Redis for this really has legs. I will definitely look into this. As for your ElasticSearch suggestions, this is exactly what I had in mind. My current state of reflexion with it is: can I distribute the indexing of my realtime data using mapreduce jobs (and scale horizontally with it)? I think so. As you noted, everything seems to be there including an HTTP interface. Thanks again for your insights. Colin On Mon, Apr 12, 2010 at 4:18 PM, Alexander Sicular <sicul...@gmail.com> wrote: > Hi Colin, > > You present an interesting use case whose exploration would help many people > on this list. I'll comment from my own experiences with the Riak HTTP > interface, which is where I primarily work and no inside knowledge of what is > on the Riak roadmap. Note that everything you can do in HTTP land will > perform at least marginally better to possibly significantly better in native > Erlang. The way I see it, here are some of my (non-exhaustive) > considerations. Key size, which includes both data size and header size, > listing keys in a bucket, searching for keys, M/R and backend persistent > storage. I'll review them in order, then I will move on to a potential > solution. . > > Key size is a consideration because Riak will not retrieve partial keys or > only links. They seem to be housed in the same internal data structure. So > this is a consideration when thinking about how to retrieve/update/save your > data. Also every update is a wipe. So you can't update just some of the > links, you need to update all of them. You can't update some of your data, > you need to update all of it. Not that this is not a bad thing per-se, it's > just that one should be aware of these limitations when designing. While we > are on links, you specifically mention "time capsule buckets". Links are > manifested at the key level and not at the bucket level. Riak links link keys > to other keys regardless of bucket membership. > > A significant deficiency that I have found with Riak is the costly list keys > function which practically necessitates the need for an external index, re. > Redis. If you have a real-time bucket where all new records are being written > to and a periodic data flush that takes some past time slice of data and puts > it somewhere else for archival then you most definitely need an external > indexing mechanism to GET those keys if you want anything close to real-time > performance. Also, remember there is no "MOVE" function there is only a GET, > PUT/POST and DELETE function, so you'll kinda need to roll your own > transactions. I've also done this without Redis and just using a specific > "index" key in a separate stats bucket in RIak. So I would have a bucket > called "real-time-data" and another bucket called "real-time-data-stats". The > list keys function completion time increases (linearly or worse depending on > number of nodes, possibly mitigated by streaming keys?) at some function > related to total number of keys in a bucket. > > A separate inconvenience is key retrieval. You can not retrieve a key or > range of keys from Riak with a regex. You actually need the full key name. It > is harder to deform keys in the primary retrieval mechanism (Riak) but that > does not stop you from using descriptive keys like YYYYMMDDHHMMSS to your > advantage in post processing like m/r. Note that Redis does allow you to do > such key retrieval. Also a consideration is that keys are not stored > (retrieved) in any order. This is of specific concern when dealing with time > series data, or any non idempotent data. > > AFAIK, Riak will accept only buckets or key lists as input for m/r functions. > The former will simply run it's own list keys function to generate all the > keys it will iterate over. I think the more performant way to do this would > be to create an m/r from your own key list based off of a key index you > maintain in Redis with periodic flush to it's own Riak bucket index bucket > with key names based on your own time frequency. A Riak m/r itself can be > stored in a Riak key so once you generate it you can keep it somewhere and > retrieve it as easily you would a key. If these are M/R jobs that you will be > constantly executing you should look into storing them in the specific M/R > javascript script directory. This will give you the added performance > advantage of pre-caching those functions. > > Something to consider is that actual backend you will use. In my testing with > the innostore backend you need to be aware of file growth. Search the Riak > list for commentary on file growth with innostore, it may be a wiki writeup > now, not sure. In it the author talks about how you can calculate how many > files will be created on disk for every bucket that exists in Riak. This > should definitely be a consideration when thinking about the file system you > use to format your disks but more importantly the number of file descriptors > your deployment operating system can maintain and how you can change those > settings. The innostore backend configuration allows for a maximum number of > fd's. Before I got that information I managed to halt a lot of machines > during my testing phase. > > Now lets talk solutions. For instance, if you want 1 second granularity you > could write out one key with a name of YYYYMMDDHHMMSS in your stats bucket. > This key would be a dump of the identical key in Redis. This would give you > 60 * 60 keys per bucket if you roll your keys up into hourly stats buckets > and 3600 keys would not terribly blow out a list keys function on any hourly > bucket. In your case, if you do decide to go with descriptive key names I'm > not sure what advantage link walking will get you. If your primary axis is > time, then you could walk the tree in your application code without having to > rely on links. An added advantage of not having one megalithic bucket is that > replication and other Riak goodies are set on a per bucket basis. You could > possibly tweak those settings as your data ages. Lastly, if you are using > the HTTP interface to Riak, might I suggest taking a look at the > ElasticSearch project, http://github.com/elasticsearch/elasticsearch, which > will index data you pass to it over HTTP as well. This could help you with > deep internal data searching other than just time. Whatever you decide, do > keep us updated on your progress. > > I think the combination of Riak as a persistent distributed data store and > Redis as a volatile (not really due to it's VM/background save and Append > only disk writing), blazingly fast and flexible cache mechanism make a very > compelling solution to your specific use case. I would go so far as to say > that perhaps Basho should consider embedding Redis as it's internal caching > layer in Riak for just this type of problem. Riak could internally manage all > the indexing and whatnot via triggers or some internal mechanic. > > -Alexander > > > On Apr 12, 2010, at 11:56 AM, Colin wrote: > >> Hi, >> >> I am trying to figure the best model in Riak for my application. I >> have read & reread the docs, wiki and list threads but haven't put any >> possible solution to the test yet. I'd like your feedback to help me >> avoid dead end solutions if possible! >> >> The basic idea is that I want to aggregate a large amount of realtime >> data and I want to easily retrieve this data using some time >> constraints (for example, all data for the past hour, day, etc). When >> I say large amount of data I mean in the order of hundreds items per >> second and each item should be stored individually. For this I will >> have a realtime items bucket. >> >> Now, to create my "time" index I have a few ideas: >> >> - create a "time capsules" buckets, with a new capsule created at some >> given interval. My application would use a periodic data flusher to >> accumulate and flush the data for my time capsule period, every second >> for example. These time capsules would only contain links to all items >> for this interval. I was actually thinking of two way links so for any >> realtime items I could also retrieve its corresponding time capsule. >> each time capsule would also be doubly-linked to allow walking forward >> or backward in time. New time capsules buckets could be created every >> day for example. >> >> The problem I foresee with this structure is the possible number of >> links for each time capsules. In this example, I am thinking one >> capsule per second, with hundreds of realtime items resulting in >> hundreds of links in each capsule. From what I have read, it might not >> be the best of idea to deal with that many links and it will not scale >> if the rate grows to thousands of items per second. >> >> - another idea would be to directly store my realtime items keys >> within each capsule instead of using links but still use links from >> realtime items to capsule and between capsules. This structure would >> allow me to gather all realtime items for a give timeframe by fetching >> the keys list within each capsule. >> >> - yet another solution would be to use another external storage >> engine, like Redis, which supports sorted sets and store references to >> my realtime items. The problem with this is that I will not be able to >> easily launch map-reduce jobs to crunch data within a specific >> timeframe. >> >> Finally, the end goal is for me to be able to run map-reduce jobs on >> my realtime data within a give timeframe to create other external >> indexes to my data, for fulltext search for example. >> >> Any comments, hints, pointers will be appreciated! >> >> Thanks, >> Colin >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com