Re: What kind of protocol are used between Riak nodes?
Hi, Alek. On May 28, 2012, at 1:40 PM, Alek Morfi wrote: > What kind of protocol is used betwwen Riak nodes to communicate. Because if > all Riak nodes are located in the same cluster (LAN network scale) there is > no problem. > But when Riak nodes are located on different clusters which are connected > through Internet, there are some limitations. Because some ISPs only allow > communicating by HTTP and SMTP protocol and I am wondering how Riak nodes can > communicate over the Internet. Within a single Riak cluster, nodes communicate with each other using the Erlang distribution protocol. There are a number of reasons within Riak's design -- this just being one of them -- why spreading a Riak cluster across a wide area is not recommended. The Riak Enterprise system (http://basho.com/products/riak-overview/) uses an entirely different protocol for managing long-haul communication, and also uses a different methodology. In that system we do not spread a single cluster widely, but rather create a topology of one cluster per datacenter, with each of those connected to each other. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak as Binary File Store
Hi, Praveen. Nothing about what you have said would cause a problem for Riak. Go for it! Justin On May 29, 2012, at 8:36 AM, Praveen Baratam wrote: > Hello Everybody! > > I have read abundantly over the web that Riak is very well suited to store > and retrieve small binary objects such as images, docs, etc. > > In our scenario we are planning to use Riak to store uploads to our portal > which is a Social Network. Uploads are mostly images with maximum size of 2 > MB and typical size ranges between few KBs to few 100 KBs. > > Does this usage pattern fit Riak? What are the caveats if any? > > Thank you! > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Atomicity of if_not_modified?
On Jan 4, 2013, at 1:25 PM, Les Mikesell wrote: > And, doesn't every description of riak behavior have to include the > scenario where the network is partitioned and updates are > simultaneously performed by entities that can't contact each other? > If it weren't for that possibility, it could just elect a master and > do real atomic operations. Yes, absolutely. There are no atomic compare-and-set operations available from Riak, regardless of headers and R/W values. Conditional HTTP requests are present because they are "free" due to Webmachine, and they are sometimes useful, but should not be seen as semantically very different from the client doing a read itself to decide whether to write. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Atomicity of if_not_modified?
On Jan 3, 2013, at 11:44 AM, Kaspar Thommen wrote: > Can someone confirm this? If it's true, what exactly is the purpose of > offering the if_not_modified flag? Yes, I confirmed this earlier in this thread: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-January/010672.html -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Stopping/Starting Riak.
On Jan 12, 2013, at 10:26 AM, Kevin Burton wrote: > I noticed that I have no problem with ‘sudo /etc/init.d/riak stop’. But, when > I try to start the process with ‘sudo /etc/init.d/riak start’ I am met with a > prompt for a password. What is the password? I don’t recall setting a > password. That is sudo, not Riak, asking for your password. You should use the same password that you use to log in to that machine. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
mailing list headers (was Re: riak cluster suddenly became unresponsive)
Hi, Ingo. On Mar 19, 2013, at 10:41 AM, Ingo Rockel wrote: > and the riak-users mailer-daemon should really set a "reply-to"… Most email client programs have two well-understood controls for replies, one for "reply (to sender)" and one for "reply to all." We are not going to make one of them broken. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: two-node cluster for riak?
Hi, Michael. Your spidey-sense is absolutely correct. Recall for a moment that Riak by default will store 3 copies of everything. This means that in a two-node configuration any given value will be stored once on one node and twice on the other. Not only does this mean a whole lot of wasted work, it removes much of the safety and availability that people look to get from Riak. If one node goes down, then for half of your keys you have lost a majority of their replicas. On only two nodes, you can't really get the kind of fault-tolerance Riak provides... with Riak or any other software. -Justin On Apr 18, 2013, at 11:26 AM, Michael Forrester wrote: > Greetings Everyone, > I am not sure if this is the right forum to ask this question, but here goes. > > We are currently running a six-node cluster in Amazon AWS. There has been > some talk by our architects of going to a two-node configuration using > SSD-backed instances with super fast hardware, but for some reason this is > triggering " this is not correct, but I don't remember why" spidey sense. > From my understanding, it is best to run riak with 5 nodes or at least N +2 > and that a two-node cluster (even though the hardware will be way faster) > will not satisfy that. > > Any loose suggestions about how to approach this? I am open to the two > ultrafast nodes... I am not sure how to put riak on them to work in a > fault-tolerant way. > > Articles, dirty limericks, and soliloquies are all appreciated. > > -- > Michael Forrester > Director of Infrastructure > WorthPoint Corporation > > 404.996.1470 O) > 404.939.6499 C) > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: write value reality check
Hi, Louis-Philippe. With a 2-node cluster and N=3, each value will be written to disk a total of three times: twice on one node, once on the other. (The W setting has no effect on the number of copies made or hosts used.) That behavior might seem a bit strange, but it's a strange configuration to run Riak on only two machines while asking it to store data on three of them. The standard settings and behavior of Riak are generally optimized for non-tiny clusters, and make much more sense when there are at least five machines. I hope this helps with your understanding. -Justin On Jun 28, 2013, at 10:54 AM, Louis-Philippe Perron wrote: > So if I get you right and extrapolate with the replication documentation > page, can I say that on a 2 nodes cluster, with a bucket set to N=3 and > W=ALL, my writes would be written 3 times to disk? (and with no guarantee to > be on different nodes)? > > thanks! > > On Wed, Jun 26, 2013 at 8:17 PM, Mark Phillips wrote: > Hi Louis-Philippe > > There are no dumb questions. :) > > On Wednesday, June 26, 2013, Louis-Philippe Perron wrote: > Hi Riak people! > Here is a dumb question, but anyway I want to clear this doubt out: > > What happens when a bucket has a W quorum value higher than the N number of > nodes? > are writes to disk multiplied? > > > Precisely. For example, if you run a one node Riak cluster on your dev > machine you'll be writing with a N val of 3 and W of 2 by default. In other > words, Riak will always attempt to satisfy the W value regardless of physical > node count. > > Hope that helps. > > Mark > twitter.com/pharkmillups > > thanks! > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Help with local restore for dev enviroment
Hi, Mark. You've already received a little advice generally so I won't pile on that part, but one thing stood out to me: > My client has sent me a backup from one of their cluster nodes. bitcask > data,. rings and config. Unless I'm misunderstanding what you're doing, what you're working on will not get you the data from the whole cluster, but the fraction of the data that was stored on the one node that you have a backup from. Just a warning, in case you hadn't realized this. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: What is the purpose of "rel" links?
Hi, Age. The Link header in HTTP as used by Riak is defined by RFC 5988. In the Link Relation Type registry (http://tools.ietf.org/html/rfc5988#section-6.2.2) you can see that the relation type "up" refers to a parent document in a hierarchy of documents. In Riak, this means the bucket a key is in. These are not Riak's own links, but rather an additional use of the Link header form which may be useful to some clients. I hope that this helps. -Justin On Jul 12, 2013, at 1:52 PM, Age Mooij wrote: > Hi > > I've been looking at links and link walking and I noticed that Riak very > often returns a special type of link with rel="up" instead of a riaktag, > which is illegal for users to create. > > What is the purpose of this link? (beyond the reasonably obvious "this key > belongs to bucket X). Why was it added? > Are there other "rels" than "up"? > Can they be followed through link walking? > > This behavior is not documented anywhere that I (or Google) could find. > > Age ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Funky List-Id headers in sent messages from this list
The mailman host that is used to manage the list was moved last month, so that's probably the source of the change. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: no access logs by default?
Hi, Ryan. On Tue, Mar 1, 2011 at 8:07 PM, Ryan Zezeski wrote: > Is this intentional? It seems like odd default behavior. Most databases, including Riak, do not write to a file every time you do a GET, SELECT, or other query as appropriate. This is because the additional disk I/O of an access log imposes a performance cost that many do not wish to pay. As you note, it can be turned on -- but we believe that by default production users generally are happier with it off and do not expect such a human-readable log for database accesses. I agree that the way to turn it on should be clearly documented. I would add that we should make sure that the documentation warns people not to turn it on except in testing/debugging scenarios. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak search - Lucene
Hi, Joshua. On Wed, Mar 2, 2011 at 6:26 PM, Joshua Partogi wrote: > I am trying to picture the relationship between Riak and Lucene [or > how Riak interacts with Lucene], which makes Riak search. This is a very easy relationship to picture, as there is no such interaction. :-) Riak Search does not use Lucene or Solr. It provides a very similar interface to those search systems in order to ease the transition for developers, but is an independent piece of software from top to bottom. Indexes are stored in Riak Search's own storage engine, queries are parsed by Riak Search's parser, and so on. The closest thing there is to such a relationship is that you can (but do not need to) use the same text analyzer libraries in Riak Search that you use in Lucene. I hope that this helps with your understanding. Best regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A script to check bitcask keydir sizes
Hi, Greg. On Thu, Mar 24, 2011 at 10:17 AM, Greg Nelson wrote: > Wouldn't it be the common case that > there are relatively few buckets? And so wouldn't it save a lot of memory > to keep a reference to an interned bucket name string in each entry, instead > of the whole bucket name? One reason this isn't done is that bitcask is an independent application, used-by rather than part-of Riak. It's just a local kv store, and knows nothing of higher-level concepts like buckets. Another reason is that there are also users with very many buckets in use, a situation that makes the proposed solution uncomfortable. In cases where there are truly few buckets and one knows it would stay that way, one could plausibly modify riak_kv_bitcask_backend (the part of Riak that talks to Bitcask) to use a bitcask per bucket on each vnode instead of a single bitcask per vnode. One downside of that approach would be that if the number of buckets did grow then the file descriptor consumption would be large and the node-wide I/O profile might be much worse as well. Everything has tradeoffs. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak vs riak_core
Hi, Mike. On Wed, Mar 30, 2011 at 5:46 PM, Mike Oxford wrote: > I thought I understood Riak, then I ran across the fact that riak_core was > split out separately. > When would you use riak_core that you wouldn't use Riak? Good question. Riak Core is the distributed systems center that Riak is built around. Riak Core is not a standalone database, and in fact by itself it doesn't do data storage or even much of anything at all from the point of view of a client application. You use Riak to store, query, and retrieve your data. You use Riak Core to build something shaped a bit like Riak. Another way of looking at this is that Riak Core is a bit more abstract, providing mechanisms for techniques such as vector clocks, gossip, and other useful parts of the servers in a robust and scalable system. Riak, the database, builds on that core by adding a client-facing storage and retrieval protocol, storage engines for placing data on disk, and so on. I hope that this helps to clarify matters. If not, or even if you just have additional questions, please ask. Best regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Load question
Hi, Runar. On Tue, Apr 12, 2011 at 3:22 AM, Runar Jordahl wrote: > It would be helpful if a wiki page (under Best Practices) was created > to discuss various load balance configurations. I am also wondering if > a Riak client could use strategy (2), like Dynamo clients can. There is not currently any client that uses strategy #2 of partition-awareness. To make it practical, we would need to extend the client-facing protocol so that an incoming client could ask to be redirected to an "ideal" incoming node. This is quite doable, though would have the downside of making such clients more complex and thus possibly more fragile. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Bitcask vs innostore, again
Hi, Dmitry. I will try to reply to some of the questions you raised about bitcask. On Thu, Apr 7, 2011 at 12:30 AM, Dmitry Demeshchuk wrote: > Now being considered as the main Riak storage. It's not just being considered, it is the main Riak storage. We are very confident in bitcask's quality and it has been the default storage engine now for some time. Some people may of course still choose innostore for various reasons but at Basho we believe that Bitcask will better suit the needs of the majority of users. > I've been having myself some problems with > bitcask previously (running out of file descriptors, bad merges) and > heard that some people periodically try to migrate from innostore to > bitcask, and stick to innostore, keeping disappointing in bitcask. We honestly don't hear much of any real problems with bitcask. It is true that depending on your setup riak can quickly run out of file descriptors if you haven't set your ulimit properly, but that is easily fixed. (and is also true under innostore, just in slightly different scenarios) I am not sure what you mean by bad merges or any failed migrations -- I'd need to hear more details to reply to that part. > What I haven't heard about bitcask yet is any production success > stories. Which storage does Wikia use, for example? Or Vibrant Media? I will leave it to each individual user to describe any details of their own production configuration as that is not our privilege to disclose. However, I can certainly say that the majority of production deployments are running bitcask. There are a few notable exceptions, certainly -- but bitcask is the typical storage engine for Riak in production these days. This certainly includes a number of businesses with the volume and duration you described. Others might share their anecdotes; what I can provide is an aggregate view. And from that perspective we are very happy with the performance and stability that bitcask's known users are experiencing. Best regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A function as an input for map/reduce
Hi, Mikhail. On Tue, May 3, 2011 at 5:55 PM, Mikhail Sobolev wrote: > Is there more information about "it can through a few keys at a time, > and the map/reduce chain would go ahead and start doing the > processing on whatever keys it gets as soon as it gets them, it does > not have to wait for the whole list of that function" (@ ~9:54 in the > video)? What I'm concerned here is about a chain of > map/map/map/reduce/reduce phases. How the processing is actually > performed? What are the synchronization points? The "map" part of the MapReduce programming paradigm is not only inherently parallel, it also does not impose a point of order on the overall dataflow and thus does not introduce a concurrency barrier. In practical terms this means that individual data items can be processed as soon as they arrive, and the results can be immediately pushed on to the next phase of the overall job without waiting for all other data to make it through the map. The "reduce" part does not have this pleasant property, as that phase is present in order to perform exactly the kinds of operations (such as counting) that do require waiting. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A function as an input for map/reduce
Hi, Mikhail. On Thu, May 5, 2011 at 5:15 PM, Mikhail Sobolev wrote: > Thank you for the description. I now wonder if it's possible for a > map-function instead of returning the whole list of results, do > something that Riak would take as "ah! another map result, let's do pass > it to the next phase"? It is quite possible in Riak to have a map phase followed by another map phase. You simply have to declare the job as having those phases, each with their map function. The way you showed it wouldn't quite work, as it is the return value -- not a side effect -- that a map function passes on to the following phase. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Make Riak use different Erlang version than that included in the .deb package
Hi, Jeremy. If you build Riak from source, you'll end up with Riak using the version of Erlang that you used to build it. With a pre-packaged version, it will use the Erlang that was used to make the packages. Riak will be moving to a newer Erlang in upcoming releases, by the way. -Justin On Thu, May 12, 2011 at 4:00 PM, Jeremy Raymond wrote: > > I'm using Riak installed from riak_0.14.0-1_amd64.deb but am having a > problem with an Erlang reduce function I wrote because it used > calendar:iso_week_number/1 which isn't available in R13B04 which is bundled > with the .deb package. Is there an easy way to configure Riak to use a > different Erlang install (say installed at /usr/local/lib/erlang)? > - Jeremy > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Production Backup Strategies
Hi, Mike. Assuming that the cluster is using the default storage engine (bitcask) then the backup story is straightforward. Bitcask only ever appends to files, and never re-opens a file for writing after it is closed. This means that your favorite existing server filesystem backup mechanism will Just Work. Other means exist, but that is the simplest and often the best. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Production Backup Strategies
Hi, Jeremy. On Sat, May 14, 2011 at 2:45 PM, Jeremy Raymond wrote: > So just backing up the files from separate nodes works? There won't be > inconsistencies in the data say if all the nodes had to be restored? That's right, it works. :-) Inconsistencies due to modifications that occur between the moments two different nodes are backed up will fixed by anti-entropy mechanisms such as read-repair. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Issues with capacity planning pages on wiki
Hi, Anthony. There are really three different things below: 1- reducing the minimum overhead of the {Bucket, Key} encoding when riak is storing into bitcask 2- reducing the size of the vector clock encoding 3- reducing the size of the overall riak_object structure and metadata All three of these are worth doing. The reason they are the way they are now is that the initial assumptions for most Riak deployments was of a high enough mean object size that these few bytes per object would proportionally be small noise -- but that's just history and not a reason to avoid improvements. In fact, preliminary work has been done on all three of these. It just hasn't yet been such a high priority that it got pushed through to the finish. One tricky part with all three is backward compatibility, as most production Riak clusters do not expect to need a full stop every time we want to make an improvement like these. Solving #1, by the way, isn't really in bitcask itself but rather in riak_kv_bitcask_backend. I can take a swing at that (with backward compatibility) shortly. I might also be able to help dig up some of the old work on #2 that is nearly a year old, and I think Andy Gross may have done some of what's needed for #3. With less words: I agree, all this should be made smaller. And don't let this stop you if you want to jump ahead and give some of it a try! -Justin On Wed, May 25, 2011 at 1:50 PM, Anthony Molinaro wrote: > Anyway, things make a lot more sense now, and I'm thinking I may need > to fork bitcask and get rid of some of that extra overhead. For instance > 13 bytes of overhead to store a tuple of binaries seems unnecessary, it's > probably better to just have a single binary with the bucket size as a > prefix, so something like > > <> > > That way you turn 13 bytes of overhead to 2. > > Of course I'd need some way to work with old data, but a one time migration > shouldn't be too bad. > > It also seems like there should be some way to trim down some of that on > disk usage. I mean 300+ bytes to store 36 bytes is a lot. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak doesn't use consistent hashing.
Hi, Greg. Thanks for your thoughtful analysis and the pull request. On Thu, May 26, 2011 at 1:54 AM, Greg Nelson wrote: > However, the skipping bit isn't part of > Riak's preflist calculation. Instead, nodes claim partitions in such a way > as to be spaced out by target_n_val, to obviate the need for skipping. A fun bit of history here: once upon a time, Riak's claiming worked in the same way as described by Amazon, with "skipping" and all. We noticed that this approach caused a different set of operational difficulties when hinted handoff due to node outages was occurring at the same time as a membership change. That prompted changes to the claim algorithm, which we still consider an area deserving of active improvement. Multiple people will be reading, analyzing, and testing your work to contribute to this improvement. We very much appreciate your efforts, and want to make sure that we incorporate them in the best possible way. Thanks, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak locking and out of memory
Hi, Ron. On Thu, May 26, 2011 at 4:33 PM, Ron Yang wrote: > On the macbook I looped across 400meg files using bash and curl to > upload them as documents into a bucket: There are other details in your post that I might comment on, but I will focus on the main point. What you describe here simply will not work. Single documents in Riak at that size are going to cause problems. There is an interface atop Riak ("Luwak") which can handle such things just fine, if large file storage is your main use case. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A script to check bitcask keydir sizes
On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer wrote: > The bigger concern for me would be the way the bucket/key tuple is > serialized: > > Eshell V5.8 (abort with ^G) > 1> iolist_size(term_to_binary({<<>>,<<>>})). > 13 > > That's 13 bytes of overhead per key were only 2 bytes is needed with > reasonable bucket/key length limits of 256 bytes each. Or if that is not > enough, one could also use a variable length encoding, so bucket/keys > can be arbitrarily large and the most common cases (less then 128 bytes) > still only use 2 bytes of overhead. I've made a branch of bitcask that effectively does this. It uses 3 bytes per record instead of 13, saving 10 bytes (both in RAM and on disk) per element stored. The tricky thing, however, is backward compatibility. There are many Riak installations out there with data stored in bitcask using the old key encoding, and we shouldn't force them all to do a very costly full-sweep of their existing data in order to get these savings. When we sort out the best way to manage a smooth upgrade, I would happily push out the smaller encoding. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pruning (merging) after storage reaches a certain size?
Hi, Steve. Check out this page: http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings Basically, a "merge trigger" must be met in order to have the merge process occur. When it does occur, it will affect all existing files that meet a "merge threshold." One note that is relevant for your specific use: the expiry_secs parameter will cause a given item to disappear from the client API immediately after expiry, and to be cleaned if it is in a file already being merged, but will not currently contribute toward merge triggers or thresholds on its own if not otherwise "dead". -Justin On Jun 7, 2011, at 4:29 PM, Steve Webb wrote: > Hello there. > > I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch cluster > with the spritzer twitter feed. I used the bitcask 'expiry_secs' to expire > data after 3 days. > > I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll be > full in 3-4 more days of ingesting data. I have no idea if/when a merge will > run to expire the older data. > > Q: Is there a method or command to force a merge at any time? > Q: Is there a way to run a merge when the storage size reaches a specific > threshold? > > - Steve > > -- > Steve Webb - Senior System Administrator for gnip.com > http://twitter.com/GnipWebb > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pruning (merging) after storage reaches a certain size?
Hi, Steve. The key to your situation was in my earlier email: One note that is relevant for your specific use: the expiry_secs parameter will cause a given item to disappear from the client API immediately after expiry, and to be cleaned if it is in a file already being merged, but will not currently contribute toward merge triggers or thresholds on its own if not otherwise "dead". That is, bitcask wasn't originally designed around the expiry-centric way of removing old data, and data that has simply expired (but not actively been deleted) will not be counted as garbage toward thresholds or triggers at this time. It will be cleaned up in a merge, but will not contribute toward causing the merge in the first place. In a use case where you only add items and never actually delete anything, a merge will never be dynamically triggered. It is plausible that we could add some expiry-statistics measurement and triggering to bitcask, but today that's the state of things. You could manually trigger merges, but that currently requires a bit of Erlang. I hope that this helps. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Ruby Client Thread Safe?
Hi, Keith. It is not safe to share a single Riak client instance across multiple client-facing threads. Riak's conflict detection mechanisms will be misled by that sort of sharing. Luckily, the client is quite lightweight so you shouldn't have to worry about the cost of doing it right. -Justin On Wed, Jun 15, 2011 at 2:05 PM, Keith Bennett wrote: > Hi, all. Is the Ruby Riak::Client thread safe? I'm wondering if it's safe > to share a single Riak::Client instance across all threads in an application. > I might run the app in JRuby, by the way. > > Are there any pros and cons to sharing a single client you can offer? > > An obvious pro is that it saves some memory, but probably an insignificant > amount. > > Thanks, > Keith > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Benchmarks of backends
Hi, Anthony. Most people using Riak today use either Bitcask or Innostore, as I suspect you know. Bitcask has excellent performance, but the limitation that you are aware of with a hard limit on number of keys per unit of available RAM. Innostore does not have that limitation, but is much harder to achieve equivalent performance on. You've noticed that multiple people (including Basho's own Dizzy and also the estimable Paul Davis) have produced wrappers for LevelDB, and indeed we are currently evaluating this as another alternative storage engine behind Riak. We will be posting some performance thoughts on LevelDB shortly, and generally it looks promising. The main blocker at this point is portability; we would like for the backend to run well on all of Riak's existing main platforms. Expect more from us on this soon. The short answer is that if you have too many keys for bitcask, the answer today is usually Innostore but soon might be LevelDB instead. Best, -Justin On Jun 17, 2011, at 7:12 PM, Anthony Molinaro wrote: > Hi, > > I'm wondering if anyone has done any testing with regards to memory > usage of various backends. After recent emails about the large overhead > of bitcask keydir indexes, and by comparing with my current production > nodes. I find that the overhead per key ends up being too large for > small keys. > > So I'm in the market for a new backend, and was wondering if anyone > out there has done any measurements on memory overhead per key, and > access times. > > I'm also wondering if there are any backends floating out there I haven't > found. I've done some google searches to come across > > https://github.com/krestenkrab/riak_btree_backend > https://github.com/cstar/riak_redis_backend > > but I'm assuming there might be others. > > Also, I figure it would be interesting to understand the overhead for > the built in backends and innnostore and possibly look at other stores > I've found which seem to have erlang wrappers like > > LevelDB: > https://github.com/basho/e_leveldb > https://github.com/davisp/erleveldb > Tokyo Cabinet: > https://github.com/rabbitmq/toke > Berkeley DB: > https://github.com/krestenkrab/bets > > So anyone know anything about these backends or other k/v stores in terms > of memory versus disk for large datasets? > > The thing prompting this is a cassandra cluster with about 14 billion > entries (7 billion with replication factor of 2), which uses 60 machines. > I was trying to determine how many bitcask backed machines it would take > to store this data and it ends up being about 150. This is mostly because > of the 84 bytes of overhead per key (43 bytes by calculations determined > on this list a few weeks ago, another 41 by measuring my current production > setup). Even with the keys of 17 bytes, thats 101 bytes of overhead, > so just wondering if there's anything better. > > Anyway, I'm trying to get some hardware to run basho_bench with and will > try out some different things, but if anyone has done any of this work > already it might be interesting to know. > > Thanks, > > -Anthony > > -- > > Anthony Molinaro > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LevelDB driver
Hi, Jonathan. On Mon, Jul 4, 2011 at 9:42 AM, Jonathan Langevin wrote: > I've seen users show concern of Bitcask's space usage overhead. How does that > compare against LevelDB? Bitcask doesn't have much in the way of disk space "overhead" unless you mean that the space used by deleted or overwritten values is not reclaimed until after a merge. In that way LevelDB is similar since space used by deleted and overwritten items is reclaimed as they are moved into older "levels" of the DB. The behavior here is not identical, but similar in concept. By way of comparison, InnoDB imposes about a 2x space overhead cost on many common datasets but the overhead is usually fairly static. > If using a Level backend, what advantages do we lose of Bitcask? ls > replication & > availability an issue at all? The functionality provided by Riak above the storage engines (such as replication and system-wide availability) are generally not impacted by your choice of storage engine. There are two main things you would lose today: 1 - latency 2 - stability The first of these is fundamental: for many usage patterns Bitcask will have a latency advantage over LevelDB due to being able to guarantee that it will never perform more than a single disk seek per operation. The second is just about the relative immaturity of LevelDB: we have not yet seen LevelDB in production environments for an extended amount of time as we have with Bitcask. Anyone using it now as a Bitcask replacement should realize that they are on the leading edge and taking the usual risks that come with adopting new software. That said, we expect LevelDB to do well over time as one of the alternative Riak storage engines. The main reason to use LevelDB under Riak would be if your number of keys is huge and thus the RAM consumption of Bitcask would make it unsuitable. That is, we expect people to use LevelDB in the same situations that they might previously have chosen Innostore as their storage engine. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LevelDB driver
On Mon, Jul 4, 2011 at 10:33 AM, Jonathan Langevin wrote: > Thanks Justin for the helpful response :-) Happy to help. > Can you define what you would consider "huge" regarding # keys? A bit depends on the details (such as key size) but generally the tipping point is somewhere near ten million keys per GB of RAM. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LevelDB driver
Hi, Phil. I might have caused a little confusion. I mentioned, but perhaps didn't sufficiently emphasize, that the benchmark comparing LevelDB to InnoDB was not a benchmark of Riak at all, but just directly talking to the storage engines in order to look at the feasibility of doing more with LevelDB. That is why there is no mention of how many nodes or any such thing: it was a one-machine test of embedded storage engines. The data was generated by basho_bench during the tests, initially using the sequential_int_gen generator and then using the pareto generators for subsequent access. As LevelDB becomes more fully supported as a backend for Riak, we will certainly publish directions and examples for configuration. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How much memory for 20GB of data?
Hi, Maria. In addition to what others have said, I would note that (at least) the following issues matter quite a bit for such planning: - how many items the data is broken up into - how large the keys will be (especially if they are very large due to embedded structure) - what storage engine ("backend") is in use - how many machines are in the cluster - the N-val, or how many replicas are being stored (default is 3) If you know those things, then you can make a more meaningful estimation. I hope that this helps. -Justin On Thu, Jul 14, 2011 at 6:02 PM, Maria Neise wrote: > Hey, > I would like to store 20GB of data with Riak. Does anyone know how > much memory Riak would need for that? > > Cheers, > Maria > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How much memory for 20GB of data?
Do you perhaps mean disk space instead of memory? If so, and if you have left the N-val at the default of 3, then you will need at least 60G of space before any other overhead is accounted for. -Justin On Thu, Jul 14, 2011 at 7:04 PM, Maria Neise wrote: > Hey, > thank you a lot for your hints. > I have 2000 records à 1KB. The key is a string like > "user123456789". I am using the default backend bitcask. There is just > one machine in the cluster and I didn't change the N-val. I already > tried to insert the 20GB of data, but 40GB of memory were obviously > not enough, because only 700 records were inserted. So I thought > mybe 150GB should be enough? > > Cheers, > Maria > > 2011/7/15 Justin Sheehy : >> Hi, Maria. >> >> In addition to what others have said, I would note that (at least) the >> following issues matter quite a bit for such planning: >> >> - how many items the data is broken up into >> - how large the keys will be (especially if they are very large due to >> embedded structure) >> - what storage engine ("backend") is in use >> - how many machines are in the cluster >> - the N-val, or how many replicas are being stored (default is 3) >> >> If you know those things, then you can make a more meaningful estimation. >> >> I hope that this helps. >> >> -Justin >> >> >> >> >> On Thu, Jul 14, 2011 at 6:02 PM, Maria Neise >> wrote: >>> Hey, >>> I would like to store 20GB of data with Riak. Does anyone know how >>> much memory Riak would need for that? >>> >>> Cheers, >>> Maria >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
The simplest guidance on client IDs that I can give: If two mutation (PUT) operations could occur concurrently or without awareness of each other, then they should have different client IDs. As a result of the above: if you are sharing a connection, then you should use a different client ID for each separate user of that connection. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Connection Pool with Erlang PB Client Necessary?
Yes, Andrew -- that is a fine approach to using a connection pool. Go for it. -Justin On Tue, Jul 26, 2011 at 3:18 PM, Andrew Berman wrote: > Thanks for all the replies guys! > > I just want to make sure I'm totally clear on this. Bob's solution > would work well with my design. So basically, this would be the > workflow? > > 1. check out connection from the pool > 2. set client id on connection (which would have some static and some > random component) > 3. perform multiple operations (gets, puts, etc.) which would be seen > as a single "transaction" > 4. check in the connection to the pool > > This way once the connection is checked out from the pool, if another > user comes along he cannot get that same connection until it has been > checked back in, which would meet Justin's requirements. However, > each time it's checked out, a new client id is created. > > Does this sound reasonable and in line with proper client id usage? > > Thanks again! > > Andrew > > > On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy wrote: >> The simplest guidance on client IDs that I can give: >> >> If two mutation (PUT) operations could occur concurrently or without >> awareness of each other, then they should have different client IDs. >> >> As a result of the above: if you are sharing a connection, then you >> should use a different client ID for each separate user of that >> connection. >> >> -Justin >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak_core questions
Hi, Dmitry. A couple of suggestions... The reason that you're not seeing an easy way to automatically have nodes be added or removed from the cluster upon going down or coming up is that we recommend strongly against such behavior. The idea is that intentional (administrative) outages are very different in nature from unintentional and potentially transitory outages. We have explicit administrative commands such as "join" and "leave" for the administrative cases, making it very easy to add or remove hosts to a cluster. When a node is unreachable, you often can't automatically tell whether it is a host problem or a network problem and can't automatically tell if it is a long-term or short-term outage. This is why mechanisms such as quorums and hinted handoff exist: to ensure proper operation of the cluster as a whole throughout such outages. Consider the case where you have a network problem such that several of your nodes lose visibility to each other for brief and distinct periods of time. If nodes are auto-added and auto-removed then you will have quite a bit of churn and potentially a very harmful feedback scenario. Instead of auto-adding and auto-removing, consider using things like riak_core_node_watcher to decide which nodes to interact with on a per-operation basis. I'm also not sure what you mean by "if the master node goes down" since in most riak_core applications there is no master node. Of course you can create such a mechanism if you need it, but (e.g.) Riak KV and the accompanying applications do not have any notion of a master node and thus do not have any such concern. I hope that this is useful. Best regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak_core questions
Hi, Dmitry. On Thu, Jul 28, 2011 at 12:22 PM, Dmitry Demeshchuk wrote: > By master node, I mean the one that is used when we are joining new > nodes using riak-admin (as far as I remember, only one node can be > used for this). You can use any node at all in the existing cluster for this purpose. They are all effectively identical. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Getting a value: get vs map
Jeremiah, You were essentially correct. A "targeted" MR does not have to search for the data, and does not slow down with database size. It is a bucket-sweeping MR that currently has that behavior. -Justin On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka wrote: > I would have suspected that an MR job where you supply a Bucket, Key pair > would be just as fast as a Get request. Shows what I know. > --- > Jeremiah Peschka > Founder, Brent Ozar PLF, LLC > > On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote: > >> MapReduce ( or a simply Map ) gets really slow when database has a >> significant amount of data ( or distributed over several servers ). Get >> instead is always faster as Riak doesn't have to search for the key ( you >> tell Riak exactly where to GET the data in your url ) >> >> Rohman >> >> On Thu, 28 Jul 2011 23:43:06 +0400, m...@mawhrin.net wrote: >> >>> Hi, >>> >>> (I looked at various places for the information, however I could not >>> find anything that would answer the question. It's not completely ruled >>> out that not all places were checked though :)) >>> >>> I use PB erlang interface to access the database. Given a bucket name >>> and a key, the value can easily be extracted using: >>> >>> {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key), >>> Value = riakc_obj:get_value(Object) >>> >>> Alternatively, a mapred (actually, just map) request could be issued: >>> >>> {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [ >>> {Bucket, Key} >>> ], [ >>> {map, {modfun, riak_kv, map_object_value}, none, true} >>> ]) >>> >>> I would expect that the result is the same while in the second case, the >>> amount of data transferred to the client is smaller (which might be good >>> for certain situations). >>> >>> So the [open] question is: are there any reasons for using the first >>> approach over the second? >>> >>> -- >>> Misha >>> >> -- >> >> Antonio Rohman Fernandez >> CEO, Founder & Lead Engineer >> roh...@mahalostudio.com Projects >> MaruBatsu.es >> PupCloud.com >> Wedding Album >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: how can I trigger a manual merge?
A direct call to bitcask:merge could force all of the files to be processed, including the removal of expired entries. That won't happen under normal Riak operation as none of the triggers will be passed by your use, but you could certainly write a script to do it directly. -Justin On Fri, Jul 29, 2011 at 7:36 PM, Steve Webb wrote: > So, I'm still working on an "insert and never delete" use of riak. I'm > expiring data after a certain amount of time, but from what I've heard/read, > it's not possible to trigger a merge at all with my usage pattern. > > So, is there a way for me to write something in erlang or something that I > can throw into cron to do periodic merges and clean things up? > > - Steve > > -- > Steve Webb - Senior System Administrator for gnip.com > http://twitter.com/GnipWebb > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Understanding put if_not_modified.
Hi, Igor. Riak (quite intentionally, for availability reasons) does not provide any sort of global transactions or user-exposed locking. One result of this is that you can't do exactly what you tried -- or least not that simply. You might be interested in https://github.com/mochi/statebox -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: automatically expiring keys with LevelDB?
On Oct 21, 2011, at 4:22 PM, Nate Lawson wrote: > I know Bitcask has the expiry_secs option for expiring keys, but what about > LevelDB? We're thinking of using Luwak as a file cache frontend to S3, and it > would be nice for older entries to be deleted in LRU order as we store newer > files. This could be implemented as a storage quota also (high/low water > mark). There is no functionality like this in LevelDB at this time. Also, I do not recommend using bitcask's expiry beneath Luwak unless you are prepared to deal with the fact that parts of a Luwak object might disappear before others. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Moving Riak bitcask directory
Stephen, That should work fine. -Justin On Nov 23, 2011, at 11:05 AM, Stephen Bennett wrote: > I want to move my Riak bitcask directory onto a different filesystem > partition in order to make use of more space that is available. > > Is it as simple as: > > 1. Stopping Riak > 2. Moving the directory to the new partition > 3. Sym-linking the directory to the old location > 4. Starting Riak > > Is there a better way to do this, and is there anything that I should be > looking out for when doing this? > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Bitcask won't merge without explicit merge() call
Dmitry, What you are expecting is Bitcask's normal behavior, though I can see why it might not be what you expected. Bitcask does not quite auto-merge; instead it provides you with the tools to easily decide when a merge is needed, and to easily have a merge scheduled when you wish. Does this example of usage clarify it for you? https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L371-374 We should probably create better documentation for this aspect of Bitcask usage in any case. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Open ticket for configurable R-value in MapReduce?
Elias, On Dec 14, 2011, at 5:32 PM, Elias Levy wrote: > If you add a node, that node will be empty. If MR chooses the new node, the > choice of R=1 will cause it to think there is no data to process. As time > goes on that node will gain new data or be populated by read-repair, but it > will still not have a complete data set until either all previous data has > been read, updated, or deleted. That is not the case. The new node will be populated by its peers in order to fill up its newly-owned vnodes with the appropriate data. > Just to confirm, you are saying that existing KV and Search data will be > redistributed within a cluster when you add a new node? That is indeed what he was saying, yes. I hope that this clarification is helpful for you. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python-riak links?
They are just stored in the metadata field of the object; what you describe is roughly equivalent except that link traversal can occur without roundtrips between Riak and your client. Justin On Dec 24, 2011, at 11:38 AM, Shuhao Wu wrote: > How are the links implemented? > > Would it be faster if I just store the unicode key in the db and look > it up or should I use links instead? > > Thanks, > > Shuhao > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Absolute consistency
On Jan 10, 2012, at 9:42 PM, Les Mikesell wrote: > How do things like mongo and elasticsearch manage atomic operations > while still being redundant? Most such systems use some variant of primary copy replication, also known as master/slave replication. That approach can provide consistency, but has much weaker availability properties. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Delete old record
On Jan 18, 2012, at 7:12 PM, kser wrote: > Is there anyway to delete old record ?? This question could mean either of two things. You can of course issue a delete request against any records you like, using any of Riak's APIs. If you would instead like records to automatically be deleted when they are old, and you are using the Bitcask storage engine, you can configure it for expiry: https://help.basho.com/entries/466512-how-can-i-automatically-expire-a-key-from-riak So, no matter which of the two questions you were asking -- the answer is "yes." -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
licenses (was Re: riakkit, a python riak object mapper, has hit beta!(
Hi, Andrey. On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" wrote: > Sorry for GPL, it's a typo. I just don't like GPL-based licenses, including > LGPL. I think it's overcomplicated. You are of course free to dislike anything you wish, but it is worth mentioning that GPL and LGPL are very different licenses; the LGPL is missing infectious aspects of the GPL. There are many projects which could not use GPL code compatibly with their preferred license but which can safely use LGPL code. Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Question about the source code: riak_get_fsm
Hi, Marc. I understand your confusion as that code is a bit subtle. The reason this isn't a bug is that upon receiving the very first notfound in your situation, the "FailThreshold" case in the clause for notfound messages would return true -- since it would already know that it could never get 3 ok responses after that. The FSM would immediately send a notfound to the client and would not wait for the subsequent vnode responses. I hope that this explanation was helpful. Best, -Justin On Tue, Apr 13, 2010 at 9:00 AM, Marc Worrell wrote: > Hi, > > I was reading the source code of riak_get_fsm to see how failure is handled. > I stumbled on a construction that I don't understand. > > In waiting_vnode_r/2 I see that: > 1. on receiving an ok: there is a check if there are R ok replies > 2. on receiving notfound: there is a check of there are R (ok + notfound) > replies > > Now suppose I have R = N = 3. > And I get back from the nodes the sequence: [notfound, ok, ok] > Then #state.replied_r = 2, and #state.replied_notfound = 1. > This will let "waiting_vnode_r({r, {ok, RObj}, ...)" stay in the state > "waiting_vnode_r". > Though we know we got an answer from all R (N) nodes, only a timeout will > move the fsm further. > > Could this be handled differently or am I missing something? > > - Marc > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Big Changes in Riak Tip
On Wed, Apr 14, 2010 at 1:47 PM, Jonathan Lee wrote: > I'm having trouble building with the latest tip on OS X 10.6. Does 0.10 > require Erlang R13B04? Yes, it does. That (and the reason for it) will be in the 0.10 release notes. Our apologies for not making that clearer earlier. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: sidebar :: quick webmachine question
Hi, Richard. On Mon, Apr 19, 2010 at 12:08 PM, Richard Bucker wrote: > I read an article(from someone at basho) that said that WebMachine was going > to be more public or something like that. In the meantime it has been forked > several times and yet projects like riak integrate it. Other branches are > many months old. > So would the real webmachine please stand up. Webmachine has been public for some time. It has its own mailing list and repo: http://www.basho.com/developers.html#Webmachine http://lists.therestfulway.com/mailman/listinfo/webmachine_lists.therestfulway.com http://hg.basho.com/webmachine http://webmachine.basho.com/docs.html I hope that those references help you to find what you need. If you have more questions, please feel free to ask them. You might get even more useful answers from the people on the Webmachine mailing list. Cheers, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: running naked : suggested firewall rules
On Wed, Apr 21, 2010 at 8:27 AM, richard bucker wrote: > If a riak server is insecure in the DMZ then it's also insecure in the > enterprise. I might be misunderstanding what you mean by this. I don't know of any enterprises that think it is a good idea to run their Oracle databases directly available to the general internet. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: setting default bucket props
If the N value for the bucket is lower than the R or W value in a request, then the request cannot succeed. That sounds likely in this case. An upcoming release will provide more useful messages when someone makes that particular client error. -Justin On Wed, Apr 28, 2010 at 12:35 PM, Matthew Pflueger wrote: > Doing what Sean suggested worked (or just specifying the chash_fun in > the default_bucket_props). Now I'm running into weird behavior that > I'm guessing is related to the n_val setting. I'm running three nodes > all on separate machines joined with a ring partition size of 64 > (22,21,21). On a fourth machine I'm running a load test in which a > process spawns 10 threads per node, each thread connecting to a one of > the nodes via protobuffs getting and putting random key/values in one > bucket. In my previous tests I used the default settings for the > bucket (n_val of 3) and everything ran smoothly for many hours. Now > I'm trying to set the default_bucket_props just changing the n_val to > 1. No errors in the logs and all clients connect successfully. > However, pretty much all communication times-out which does not happen > with the default bucket props (changing the n_val back to 3 fixes the > problem). > > --Matthew > > > > On Wed, Apr 28, 2010 at 11:39, Sean Cribbs wrote: >> We used to have a function that would merge the values from app.config with >> the hardcoded defaults for bucket properties. I've opened an issue on >> bugzilla for this problem (Bug 123). In the meantime, remove the stuff >> you've set, start up the console, and run this in the Erlang shell: >> application:get_all_env(riak_core). >> From that output, copy the default_bucket_props and modify what you want. >> Sean Cribbs >> Developer Advocate >> Basho Technologies, Inc. >> http://basho.com/ >> On Apr 28, 2010, at 10:57 AM, Matthew Pflueger wrote: >> >> Forgot to say I'm using riak-0.10.1... >> >> --Matthew >> >> >> >> On Wed, Apr 28, 2010 at 10:56, Matthew Pflueger >> wrote: >> >> I am trying to set the default n_val in my app.config. I'm not >> >> getting any errors on startup but when a client tries to put some data >> >> a process crashes eventually causing a time-out on the client side... >> >> app.config part: >> >> [ >> >> %% Riak Core config >> >> {riak_core, [ >> >> %% Default location of ringstate >> >> {ring_state_dir, "data/ring"}, >> >> %% Default bucket props >> >> {default_bucket_props, [{n_val, 1}]}, >> >> >> I'm seeing the following in the logs: >> >> sasl-error.log: >> >> =CRASH REPORT 28-Apr-2010::15:36:22 === >> >> crasher: >> >> initial call: riak_kv_put_fsm:init/1 >> >> pid: <0.505.0> >> >> registered_name: [] >> >> exception exit: {undef,[{riak_core_bucket,defaults,[]}, >> >> {riak_core_util,chash_key,1}, >> >> {riak_kv_put_fsm,initialize,2}, >> >> {gen_fsm,handle_msg,7}, >> >> {proc_lib,init_p_do_apply,3}]} >> >> in function gen_fsm:terminate/7 >> >> ancestors: [<0.504.0>] >> >> messages: [] >> >> links: [] >> >> dictionary: [] >> >> trap_exit: false >> >> status: running >> >> heap_size: 1597 >> >> stack_size: 24 >> >> reductions: 475 >> >> neighbours: >> >> erlang.log.1 >> >> =ERROR REPORT 28-Apr-2010::15:36:22 === >> >> ** State machine <0.503.0> terminating >> >> ** Last event in was timeout >> >> ** When State == initialize >> >> ** Data == {state, >> >> {r_object,<<"profiles">>,<<"DymvhHkDplIEmpowMdQ35Q">>, >> >> [{r_content, >> >> {dict,0,16,16,8,80,48, >> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[], >> >> [],[]}, >> >> >> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[], >> >> [],[]}}}, >> >> <<>>}], >> >> [{<<31,41,45,38>>,{1,63439684582}}], >> >> {dict,1,16,16,8,80,48, >> >> >> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}, >> >> {{[],[],[],[],[],[],[],[],[],[], >> >> [[<<"content-type">>,97,112,112,108,105,99,97, >> >> >> 116,105,111,110,47,111,99,116,101,116,45,115, >> >> 116,114,101,97,109]], >> >> [],[],[],[],[]}}}, >> >> <<4,155,69,121,249,86,125,168,81,201,133,2,65,248, >> >> 238,53,23,1,40,242,226,220,30,37,113,164,204,34, >> >> >> 199,41,155,198,77,100,101,234,83,233,181,96,207,10, >> >> ...lots more data... >> >> ** Reason for termination = >> >> ** {'function not exported',[{riak_core_bucket,defaults,[]}, >> >> {riak_core_util,chash_key,1}, >> >>
Re: setting default bucket props
On Wed, Apr 28, 2010 at 1:38 PM, Matthew Pflueger wrote: > Stupid question: Is there a way to set the default read values for a > request on the server side when a client doesn't explicitly set them? Not currently. The defaults at this time are in the client libraries. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Hello, Bitcask!
Riak Users, You might have noticed that we released a new local key/value store recently: http://blog.basho.com/2010/04/27/hello,-bitcask/ As of just now, it is available as a storage engine ("backend") in the tip of the Riak repository. You can use it like any other backend just by setting the storage_backend application variable in the riak_kv application to riak_kv_bitcask_backend (in your "app.config") on a fresh node so that it will use Bitcask for storage. There is a new application in app.config, "bitcask", for more detailed configuration of bitcask behavior. Some of the variables you can set in there are: data_root: string (required) - the directory for bitcask to use for storage and metadata merge_strategy: {hours, N} - perform a data file merge every N hours sync_strategy: how to manage syncing of data files being written. choices: none (default)- let the O/S decide o_sync - use the O_SYNC flag to sync each write {seconds, N} - call bitcask:sync/1 every N seconds A couple of things aren't done yet, including more proactive generation of hintfiles, faster startup time, smarter merge strategies, more extensive testing on more platforms, documentation on usage, and more. We are not yet recommending this as a primary production backend, but we expect to very soon. Your feedback is welcomed. -Justin p.s. -- it's not slow. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Bitcask backend is very unstable on OS X 10.6.3
Hello, That error message is due to running out of filehandles. I am guessing that you have a large number of empty files in your bitcask data directories. If so, there are two pieces of information you may find useful: 1 - it is safe to delete the empty files 2 - This will be addressed very soon, before bitcask is considered an officially-supported backend. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Replication behavior
Hi, Jimmy. With an n_val of 3, there will be 3 copies of each data item in the cluster even when there are less than 3 hosts. With 2 nodes in that situation, each node will have either 1 or 2 copies of each item. Does that help with your understanding? -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: CAP controls
Hi, Jeremy. It sounds like an interesting project. At this time, there is no way to indicate in Riak that two nodes are actually on the same host (and therefore should not overlap in replica sets). It could certainly be done, but to do so today would require modification to the ring partition claim logic. Best, -Justin On Thu, May 13, 2010 at 4:57 PM, Jeremy Hinegardner wrote: > I am thinking about how to possibly replace an existing system that has heavy > I/O load, low CPU usage, with riak. Its a file storage system, with smallish > files, a few K normally, but billions of them. > > The architecture, I think, would be one riak node per disk on the hardware, > and probably run about 16 riak nodes per physical machine. Say I had > 4 of these machines, which would be 64 riak nodes. > > With something like this, if I set W=3 as a CAP tuning, I would want to make > sure that at least 2 of those writes where on 2 physically different machines, > so in case I had a hardware failure, and it took out a physical machine, I > could > still operate with the other 3 machines. > > Is something like this possible with riak? > > enjoy, > > -jeremy > > -- > > Jeremy Hinegardner jer...@hinegardner.org > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: returning multiple documents
Hi, Gareth, You've pretty much hit on it. Either of your two options will work fine. Regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Recovering datas when a node was joining again the cluster (with all node datas lost)
Hello, Germain. You've already come across read-repair. Between that and hinted-handoff a great deal of passive anti-entropy is performed in a Riak cluster. As long as one doesn't use requests with R=N these mechanisms are generally sufficient. We do have plans for a more "active" anti-entropy as well, so that if you know a given node has lost all of its data you can trigger a much more efficient and immediate complete repair from the replicas in other nodes. (without needing external backups) At this point, that is only a plan and not a developed feature, so if you don't perform backups then a trawling read-repair is your best bet in the case of a complete loss of a node. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: cannot query bucket when a node is down
On Tue, Jun 1, 2010 at 1:56 PM, Sam Tingleff wrote: > With no single point of failure there is no single index of keys. So > the only way to get an exhaustive list of keys in a given bucket is to > ask all nodes (I do not know if this is what riak is actually doing). Sam is exactly right that Riak doesn't centralize anything and so there is no collected index of keys. However, you don't quite have to ask every node; you have to ask enough nodes to know that you hit at least one replica of every object. This is what listing keys (GET /bucket) does. There was a bug that just recently got fixed that could in some cases cause the whole listing to hang up due to a single misbehaving or down node, depending on timing. This fix was placed in tip this morning and the fix will go out in the next release. As long as you have enough nodes around that you could get every object in the cluster using R=1, you should be able to list the keys in a bucket. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: I need distributed file system ?
Hello, Antoni. Riak handles all the distribution for you, and generally expects to store its data to a local filesystem. You do not need or want any sort of underlying distributed filesystem in addition to Riak. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Switching of backends
Germain, If you have enough excess capacity that your cluster will be safe with one less machine for a little while, you can do this another way. Just "riak-admin leave" one machine, wait for it to hand off all of its data, "riak stop", set up that machine with a new install/config-file/backend/etc, and then start and join it as though it was a brand new node. Wait for it to get its share of data sent to it in its new role, then repeat this process on the next node. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Switching of backends
On Tue, Jun 8, 2010 at 8:29 AM, Mårten Gustafson wrote: > How would I know when a node has handed off all its data - would the > status command report that it doesn't own any partitions? Good question. That won't quite do it, because the node will give up ownership of the partitions first, and that will cause it to begin pushing off that data to the new owners. We hope to add a more obvious sign in the stats resource for this, but for now the easiest way to tell is to just look at the disk usage in the exiting node's data directory. It should become empty when the node completes handing off data. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Riak Release 0.11.0
Hi, Germain. On Fri, Jun 11, 2010 at 11:07 AM, Germain Maurice wrote: > Because of its append-only nature, stale data are created, so, how does > Bitcask to remove stale data ? An excellent question, and one that we haven't yet written enough about. > With CouchDB the compaction process on our data never succeed, too much > data. > I really don't like to have to launch manually this kind of process. Bitcask's merging (compaction) process is automated and very tunable. These parameters are the most relevant in your bitcask section of app.config: (see the whole thing at http://hg.basho.com/bitcask/src/tip/ebin/bitcask.app) %% Merge trigger variables. Files exceeding ANY of these %% values will cause bitcask:needs_merge/1 to return true. %% {frag_merge_trigger, 60}, % >= 60% fragmentation {dead_bytes_merge_trigger, 536870912}, % Dead bytes > 512 MB %% Merge thresholds. Files exceeding ANY of these values %% will be included in the list of files marked for merging %% by bitcask:needs_merge/1. %% {frag_threshold, 40}, % >= 40% fragmentation {dead_bytes_threshold, 134217728}, % Dead bytes > 128 MB {small_file_threshold, 10485760}, % File is < 10 MB Every few minutes, the Riak storage backend for a given partition will send a message to bitcask, requesting that it queue up a possible merge job. (only one partition will be in the merge process at once as a result of that queue) The bitcask application will examine that partition when that request reaches the front of the queue. If any of the trigger values have been exceeded, then all of the files in that partition which exceed any threshold values will be run through compaction. This allows you a great deal of flexibility in your demands, and also provides reasonable amortization of the cost since each partition is processed independently. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Riak Release 0.11.0
Hi, Alan. Your replicas do in fact exist on both nodes. However, I understand that the situation you are observing is confusing. I will attempt to explain. Quite some time ago, something surprising was noticed by some of our users during their pre-production testing. Some intentional failure scenarios (with busted nodes, etc) would fail much more slowly when R=1 than when R=2. This was due to the fact that to satisfy a R=1 request with a non-object response (timeout or notfound), we would wait for all N nodes to reply. With R=2, we could send this response as soon as N-1 nodes reply. In some situations this is a dramatic difference in time. To remove this perceived problem we implemented what we refer to as "basic quorum". If a simple majority of vnodes have produced non-successful internal replies, we return a non-success value such as a notfound. This means that if there is only one copy of the object out there, and the node holding it is slowest to respond, the client will not see that object in their response but will instead get the notfound instead of waiting for the last node to respond or time out. (note that read-repair will still occur in any case) This could be avoided if we considered "not found" to be a success condition, but then in the above situation you would see not founds even with R=2. That would simply be defined as another kind of "successful" response. Either way, it is a tradeoff of different kinds of surprise. I hope that this explanation helps with your understanding. On another note, it's not useful to run Riak with a number of physical hosts less than your N value unless you're planning on expanding it soon. So: testing with 2 hosts and N=3 means that you are testing against a very much not-recommended configuration. I suggest either using more hosts or else changing your default bucket N value to 2. -Justin On Mon, Jun 14, 2010 at 1:59 PM, Alan McConnell wrote: > Hey Dan, > I have a 2-node cluster with default bucket settings (N=3, etc.), and if I > take one of the boxes down (and perform reads with R=1) I get tons of "key > not found" errors for keys I know exist in the cluster. Seems like for many > keys, all 3 replicas live on one host. From what you've written here > though, it seems like that should not happen. Do you know of any way my > cluster could have gotten into this state? > I did run a restore on this cluster using a riak-admin backup from a > different, single-node cluster. I wonder if that caused an uneven > distribution. > Any help would be appreciated. As it stands now our 2-node cluster has > serious read problems if either node goes down. > -Alan ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Recap for 6/10 - 6/13
Hi, Joel. Thanks for your input! On Mon, Jun 14, 2010 at 4:17 PM, Joel Pitt wrote: > [re bitcask and in-memory data] > I'm sure it's probably already been considered, but just in case... > bloom filters could be an alternative to the requirement of keeping > *all* the keys in memory. I don't know if this would necessarily fit > with the usage of this in-memory key/metadata data structure though. We are exploring ways to keep bitcask's overall performance profile while relaxing the memory requirement a bit, though we have not yet determined how to do so. Bloom filters can be incredibly handy, but wouldn't (alone) solve this problem as we use the in-memory hash table to tell bitcask the location of the stored value in terms of file and offset. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Best way to back-up riak
Hi, Jan. On Sun, Jul 11, 2010 at 8:53 PM, Jan Vincent wrote: > Given that riak is new in the database field, if ever I use riak in > production, > what would be the best way to back it up? I know that there's redundancy > on the different nodes and NRW may be modifiable per request, but I'm > wondering if there's a way to snapshot the dataset periodically -- at least > until riak becomes provably battle tested. Riak is fairly battle-tested already: we were using its prior version under Basho's own customer-facing applications in 2008, and a number of external customers and users are in production today. That said, even a solid distributed database needs to be backed up as there are many reasons to have backups. The easiest and best way to back up Riak is, if you are using bitcask (the default) as the backend, to simply back up the filesystem of your nodes with whatever backup system you use for the rest of your systems. Bitcask uses append-only files, and once it closes a file it will never change the content of that file again. This makes it very backup-friendly. If you are using a backend with less backup-friendly disk format (such as innostore) then you can use the "riak-admin backup" command at either the per-node or whole-cluster level to produce a backend-independent snapshot that can be loaded back in via "riak-admin restore". This method is much slower, will impose additional load on your cluster when running, and requires that you have a place to put the generated snapshot. However, it will work regardless of backend and is also a simple if heavyweight way to migrate to a cluster with a different configuration. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Conflict Resolution
Hello, Misha. On Tue, Jul 13, 2010 at 1:06 PM, Misha Gorodnitzky wrote: > From doing a little testing, the last value in a multipart document is > the first, so "Thursday" in this case, can we assume that this will > always be the case? And is it a good idea to base conflict resolution > on this? It is not really a good idea to base conflict resolution on the order that Riak presents the siblings. While in simple cases you may see predictable behavior, there is no guarantee of determinism in the order they'll be stored in. I suggest instead that if you need an interesting conflict resolution strategy, you might do well to store the information needed for that strategy explicitly in the object along with the content. I hope that this helps. Please do ask more if this doesn't clear it up for you. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak slides?
Hi, Wilson. There are many sets out there. Which ones suit you best depends a lot on what you plan on saying in your talk. If you tell us a bit about the audience, the event, and what you hope to get across in your talk, then I bet that one of the people here who has given a Riak talk will have material useful for you to crib from. Cheers, -Justin 2010/7/13 Wilson MacGyver : > Hi, > > I'm going to be giving a talk on riak sometime soon. Anyone has slides > I can steal/borrow? :) > > Thanks > > -- > Omnem crede diem tibi diluxisse supremum. > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Conflict Resolution
On Wed, Jul 14, 2010 at 5:25 AM, Misha Gorodnitzky wrote: > I don't suppose there are any examples anywhere of how people have > approached conflict resolution with RIak? That would be useful to help > people understand how to approach it ... maybe a section on the wiki > could be dedicated to it. This is a great idea. We'll find a right place to present this that's easier to find. > In our particular case, we're trying to store transactions in Riak and > need to guard against a transaction being placed on a balance that has > reached 0. The problem we keep running into is race conditions between > when we record the transaction and when we update the cached balance > value. Any suggestions on how this has been, or can, be solved would > be appreciated. I suggest that you solve this similarly to the way that banks have been doing so for far longer than there have even been computers involved. Each transaction should be (at least) a unique identifier, a time, and the amount being added or subtracted to the balance. This way (in addition to storing what you believe the balance to be at any time) you can reconcile balances even if you get some transactions late or multiple times. More specifics than that will depend a lot on your application, but the key here is that you can make things much neater in situations where your actions can be commutative and idempotent. That's why you store the transaction itself instead of just the balance, and a unique id so that you don't repeat yourself. Best of luck, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Hi, Eric! Thanks for your thoughts. On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson wrote: > I would think that this requirement, > retrieving all objects in a bucket, to be a _very_ common > place occurrence for modern web development and perhaps (depending on > requirements) _the_ most common function aside from retrieving a single k/v > pair. I tend to see people that mostly try to write applications that don't select everything from a whole bucket/table/whatever as a very frequent occurrence, but different people have different requirements. Certainly, it is sometimes unavoidable. > In my mind, this seems to leave the only advantage to buckets in this > application to be namespacing... While certainly important, I'm fuzzy on > what the downside would be to allowing buckets to exist as a separate > partition/pseudo-table/etc... so that retrieving all objects in a bucket > would not need to read all objects in the entire system The namespacing aspect is a huge advantage for many people. Besides the obvious way in which that allows people to avoid collisions, it is a powerful tool for data modeling. For example, sets of 1-to-1 relationships can be very nicely represented as something like "bucket1/keyA, bucket2/keyA, bucket3/keyA", which allows related items to be fetched without any intermediate queries at all. One of the things that many users have become happily used to is that buckets in Riak are generally "free"; they come into existence on demand, and you can use as many of them as you want in the above or any other fashion. This is in essence what conflicts with your desire. Making buckets more fundamentally isolated from each other would be difficult without incurring some incremental cost per bucket. > I might recommend a hybrid > solution (based in my limited knowledge of Riak)... What about allowing a > bucket property named something like "key_index" that points to a key > containing a value of "keys in bucket". Then, when calling GET > /riak/bucket, Riak would use the key_index to immediately reduce its result > set before applying m/r funcs. While I understand this is essentially what > a developer would do, it would certainly alleviate some code requirements > (application side) as well as make the behavior of retrieving a bucket's > contents more "expected" and efficient. A much earlier incarnation of Riak actually stored bucket keylists explicitly in a fashion somewhat like what you describe. We removed this as one of our biggest goals is predictable and understandable behavior in a distributed systems sense, and a model like this one turns each write operation into at least two operations. This isn't just a performance issue, but also adds complexity. For instance, it is not immediately obvious what should be returned to the client if a data item write succeeds, but the read/write of the index fails? Most people using distributed data systems (including but not limited to Riak) do explicit data modeling, using things like key identity as above, or objects that contain links to each other (Riak has great support for this) or other data modeling means to plan out their expected queries in advance. > Anyway, information is pretty limited on riak right now, seeing as how it's > so new, but talk in my development circles is very positive and lively. Please do let us know any aspects of information on Riak that you think are missing. We think that between the wiki, the web site, and various other materials, the information is pretty good. Riak's been open source for about a year, and in use longer than that; while there are many things much older than Riak, we don't see relative youth as a reason not to do things right. Thanks again for your thoughts, and I hope that this helps with your understanding. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
I think that we are all (myself included) getting two different issues a bit mixed up in this discussion: 1: storing an implicit index of keys in the Riak key/value store 2: making buckets separate in that a per-bucket operation's performance would not be affected by the content of other buckets The thread started out with a request for #2, but included a suggestion to do #1. These are actually two different topics. The first issue, implicitly storing a big index of keys, is impractical in a distributed key/value storage system that has Riak's availability goals. We are very unlikely to implement this as described in the near future. However, we very much recognize that there are many different ways that people would like to find their data. In that light, we are working on multiple different efforts that will use the Riak core to provide data storage with more than just "simple" key/value access. The second issue, of isolating buckets, is a much simpler design choice and is also a per-backend implementation detail. We can create and provide an alternative bitcask adapter that does this. It will be a real tradeoff: in exchange for buckets not impacting each other as much, the system will consume more filehandles, be a bit less efficient at rebalancing, and will generally make buckets no longer "free". This is a reasonable tradeoff in either direction for various applications, and I support making it available as a choice. I have created a bugzilla entry to track it: https://issues.basho.com/show_bug.cgi?id=480 I hope that this helps to clarify the issue. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Hi, Alexander. On Wed, Jul 21, 2010 at 1:36 PM, Alexander Sicular wrote: > uses a separate bitcask per-bucket per-partition. What is a partition here? A > vnode or a physical host or something else? My apologies. Given that it was in our bugzilla I let myself use some Riak-internals jargon without explanation. In this context, a partition is a logical segment of the ring space, managed by a vnode process on a given physical host. There is a 1-to-1 mapping between a vnode process and a partition. The idea is that right now the bitcask backend stores all data in a given partition together in a single bitcask instance. The alternative backend under discussion would break that up, such that within a partition (and thus in each vnode), there would be a bitcask instance for every bucket that had any data. Does that help to clarify? -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is it inefficient to map over a small bucket when you have millions of other buckets?
On Tue, Jul 13, 2010 at 6:02 AM, Nicolas Fouché wrote: > Giving just a bucket WILL traverse the entire keyspace. You may be interested in: https://issues.basho.com/show_bug.cgi?id=480 -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Best way to back-up riak
On Wed, Jul 21, 2010 at 2:01 PM, Alan McConnell wrote: > I'm curious about this as well. Say I have a ten node cluster. Could I > just schedule a midnight copy of each bitcask data directory every night, > then restore to another ten node cluster by dropping one of each data > directories on each new node? How close does the timing needs to be? What > if the data directory snapshots were taken seconds or minutes apart? While Basho does provide a product including features that make whole-datacenter failure much less of a problem (by fully replicating to a cluster in another location) I will answer assuming you have only a single cluster. The timing doesn't have to be perfectly synchronized, but you should try to make it as close as is practical just so that you have a good way to judge what is contained in a given backup. If a storage (put) operation occurs in an interval between single-node backups, it will be present in the restored cluster when requested (and repopulated via read-repair) as long as it was in at least one of the nodes. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Use of fallback nodes for get requests?
Hi, Nico. On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer wrote: > What I mean is, if I do a get request for a key with R=N, and one of the > first N nodes in the preflist is down the request will still succeed. > Why is that? Doesn't that undermine the purpose of seting R to a high > number (specifically setting it to N)? That way a request might succeed > even if all primary nodes responsible for the key are unavailable. You are correct, and this is intentional. There is nothing in the R or W settings that is intended to indicate anything at all about "primary" nodes. It is rather simply the number of successful responses that the client wishes to wait for, and thus the degree of quorum sought before a client reply is sent. Using fallback nodes to satisfy reads is a natural result of using fallback nodes to satisfy writes. If all primary nodes responsible for a key are unavailable, but enough of the fallback nodes for that key have received a value for that key since they went unavailable (through a fallback write) then a request to get that key might succeed. I am not sure why you see this as a bad thing. (It will only succeed if R nodes actually provide a successful result, not just if they are available.) > On a similar note, why is the riak_kv_get_fsm waiting for at least > (N/2)+1 responses, if there are only not_found responses, effectively > ignoring a smaller R value of the request if the key does not exists? This is a compromise to deal with real situations that can occur where a single node might be taking a very long time to reply, and a value has never been stored for a given key. Without either this basic quorum default for notfounds or alternately considering a notfound as success and thus only waiting for R of them, that situation would mean that an R=1 request would take much longer to complete than an R=2 request (due to waiting for the slow node) which is confusing to most users. Note that since it applies to notfounds, this tends to only come into play for items that have never been successfully stored with at least a basic quorum -- things that really are not present, that is. > My guess was, that this also has to do with the use of fallback nodes: > Since the partition will usually be very small on the fallback/handoff > node, it is likely to be the first to answer. So to avoid returning > false not_found responses, a basic quorum is required. > Am I on the right track here? It doesn't have anything to do with fallback nodes explicitly. It is for situations where a node is under any condition that will slow it down significantly. In such situations, there is little to be gained in waiting for all N replies if (N/2)+1 have already declared notfound. > The problem is, this is imposed even for the case that all nodes are up. > If one requires very low latency or very high availability (that's why > one uses a small R value in the first place) and does a lot of gets for > non existent keys, riak silently screws you over by raising R for those > keys. It seems that there is something here worth clarifying. If you are issuing requests with W+R<=N, and some reads following writes return notfound during an interval immediately following initial storage time... well, that's what you asked for by not requesting a quorum. If you store the object with a sufficiently high W value first, then you will not get this sort of notfound response even if your R value is only 1. I suppose that providing the freedom to do this might be considered "screwing you over," but we see it more as allowing you to make different choices while still providing safe and unsurprising default behavior. If you try hard enough to screw yourself over, though, Riak won't stop you. If you issue write requests (to any dynamo-model system) with some W, followed immediately by a read request with some R, and W+R is not greater than N, you should not be expecting the write to necessarily be reflected yet. > I most likely missed something here, but some ad hoc test I did seem to > be consistent with my understanding of the code. You have certainly put some real effort into understanding some choices made in the Riak code, which I appreciate. I hope that I have helped to extend your understanding of the real operational scenarios that have motivated those choices, and how the code will behave in those scenarios. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Heterogeneity
Hi, Michael. On Tue, Aug 17, 2010 at 12:52 PM, Michael Russo wrote: > In the Dynamo design, the number of vnodes per physical node can be tweaked > to satisfy the heterogeneity principle. > Is there any way to do something similar with Riak? This is something that we think is an important idea, and that the underlying structure of Riak can work fine with. However, simply out of prioritization thus far we have not yet made it easy to do this and doing so effectively is not simple from a user point of view. I do not know of any production clusters at this time that use anything other than the standard near-equal distribution of vnodes. I do expect that explicitly configuring different nodes to have different "weight" will be enabled in a future release, but it is not currently on anyone's scheduled plans that I know of. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: failed to merge?
Hi, Wilson. On Sat, Aug 21, 2010 at 10:06 PM, Wilson MacGyver wrote: > =ERROR REPORT > Failed to merge > follow by a bunch of list of bitcask files > > with final status > > : no_files_to_merge > > how does this happen, does this mean some files in the bitcask are missing? That's just an overenthusiastic message, and nothing to worry about. It was a very useful thing to see when doing the initial bitcask integration/backend into Riak. The message will cease to appear from your error log in a subsequent release. All it means is that one merge was scheduled while another was running, so the first one did all the work and the second had nothing to do. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
list_keys is less bad
Riak Users, One aspect of Riak's interface that has often been discouraged in the past is the listing of all keys in a bucket. This has been for two reasons: the first is that it is necessarily an operation that is more heavyweight than any of the more targeted get/put/delete sorts of things, but the second is that due to the priorities of the first many users of Riak we hadn't really put much optimization into that area. As a result, anything that required getting all keys from a bucket was fairly slow and also fairly heavy in terms of memory consumption. We have put some effort into this recently and seen marked improvement. The changes can be summed up as: 1- bitcask has a new fold_keys operation, which performs far less I/O in most cases than the previous mechanism underlying list_keys. 2- the Riak backend interface to bitcask uses the new fold_keys operation. 3- the mechanism underlying the cluster-wide list_keys operation has changed to require far less total memory in proportion to the list. Due to these three changes, there are two effective results: 1- In nearly all cases, the list_keys operator is much faster than before. In some common cases it is 10 times faster. 2- In cases of very large buckets, memory allocation will not spike during key listing. (though of course if you ask Riak to build the whole list for you instead of streaming it out, then at least that much must be used to accommodate) Note that since map/reduce uses the streaming list_keys under the hood when performing map/reduce over a whole bucket, these changes affect that interface's performance as well. The described changes are now in the trunks of the relevant repositories, and will be included in the next release. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: list_keys is less bad
On Mon, Aug 23, 2010 at 10:05 PM, Alexander Sicular wrote: > Three cheers! :-) > Git clone && make all && make rel It looks like they haven't yet migrated out to the github repos, but should do so sometime soon. In the meantime, the bitbucket repos are up to date with tip so you can get the bleeding edge from there. Sorry if there was any confusion there. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Filesize in riak
Hi, John. On Thu, Sep 2, 2010 at 11:24 AM, John Axel Eriksson wrote: > I know the recommendation of max 50 megs per file in riak currently... but I > tried > uploading a file that was around 120 megs and everything went fine. Riak doesn't itself mandate a maximum object size... but since a riak_object in transit must be materialized into an in-memory data structure and copied across processes, large objects can cause very poor performance or failure. The exact practical maximum can vary a bit. There is some (prototyped but not yet fully integrated) work that, when released, will allow you to store large objects easily by transparently chunking them into a hash tree of smaller objects. More details on this when it lands. I hope that this information is helpful. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak and no of clients limit?
Hello, Senthilkumar. On Fri, Sep 3, 2010 at 4:28 PM, Senthilkumar Peelikkampatti wrote: > I am using Riak with distributed Erlang and I wanted to know what's > the limit on # of riak clients (I used it before erlang pb client, so yet to > migrate). I am using single client to talk to Riak, is it better? or in web, > is it ok to create a client per request? I looked riak_kv_wm_raw.erl which > seems using a connection per request but it is a erlang local_client. There is not a fixed limit imposed by Riak, but it is a general good practice to re-use clients for subsequent (non-concurrent) requests. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak and no of clients limit?
Hi, Seth. On Sat, Sep 4, 2010 at 5:59 PM, Seth Falcon wrote: > I'm working on a project where we have a webmachine-backed service > that talks to Riak. I currently initialize one pb client for each > node in the cluster as part of the webmachine startup. Then the > resources in the webmachine app ask for one of these clients for each > request. > > Your comment above about reusing clients for non-concurrent requests > makes me wonder if this is the wrong approach. Comments or > suggestions? Each instantiated riak_client has a unique client-id that will represent that client in all updates (put-requests) that it makes. That is, the entries in the vector clock will match that client-id. Much of the value of vector clocks can vanish if concurrent writes to the same values can be issued with the same client-id. Sharing connections as you describe might be fine, depending on the details. However, if your resources might overlap in a way like the following example then you probably have a problem. A and B are resource instances handling separate concurrent HTTP requests but sharing a client-id C. A issues get(K), receiving object X with vector clock V B issues get(K), receiving object X with vector clock V A issues put(K,Xa) where Xa is an update to X B issues put(K,Xb) where Xb is an update to X You can lose one of the two updates, as they are both a single update to V from client C. It is assumed that a given client will not compete with itself for updates. I hope that this explanation is helpful. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak and no of clients limit?
On Sat, Sep 4, 2010 at 7:31 PM, Seth Falcon wrote: > Given that, it sounds like one would want a pool of pb clients such > that each resource takes a client out of the pool when handling a > request and returns it when done. So there would be no concurrent > requests going through the same client. > > Does that seem like a reasonable approach? Yes. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Listing large key spaces, and bucket Links header
Hi, Gavin. A couple of things you may be interested in: - There have been improvements in both Bitcask and Riak since 0.12.1 (in tip of trunk and will be in the next release) to speed up (and reduce the resource consumption of) key listing. - You should probably use keys=stream in your requests instead of keys=true, to avoid the full keylist (and Link header) being built up all at once. Between those two items, you may have all that you need. I hope this helps. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: File descriptor leaks?
Hi, Dmitry. What version of Riak are you using? And is there anything interesting in the error logs? -Justin On Thu, Oct 14, 2010 at 7:53 AM, Dmitry Demeshchuk wrote: > A small update. I've just encountered the same problem. Just about 3-4 > hours have passed. > > lsof | wc -l showed only about 2k descriptors for all users. That's > even more weird as the 32k descriptors limit is per user. So, we > haven't reached the limit so far. > > On Thu, Oct 14, 2010 at 3:48 PM, Dmitry Demeshchuk > wrote: >> Greetings. >> >> We have recently started to get the emfile errors. ulimit -n is 32767. >> Restarting Riak helps for several hours and then we run out of >> descriptors again. >> >> Some time later after restart I performed lsof and found the following >> descriptors: >> >> kondemand 154 root cwd unknown >> /proc/154/cwd (readlink: Permission denied) >> kondemand 154 root rtd unknown >> /proc/154/root (readlink: Permission denied) >> kondemand 154 root txt unknown >> /proc/154/exe (readlink: Permission denied) >> kondemand 154 root NOFD >> /proc/154/fd (opendir: Permission denied) >> kondemand 155 root cwd unknown >> /proc/155/cwd (readlink: Permission denied) >> kondemand 154 root cwd unknown >> /proc/154/cwd (readlink: Permission denied) >> kondemand 154 root rtd unknown >> /proc/154/root (readlink: Permission denied) >> kondemand 154 root txt unknown >> /proc/154/exe (readlink: Permission denied) >> kondemand 154 root NOFD >> /proc/154/fd (opendir: Permission denied) >> kondemand 155 root cwd unknown >> /proc/155/cwd (readlink: Permission denied) >> kondemand 155 root rtd unknown >> /proc/155/root (readlink: Permission denied) >> kondemand 155 root txt unknown >> /proc/155/exe (readlink: Permission denied) >> kondemand 155 root NOFD >> /proc/155/fd (opendir: Permission denied) >> kondemand 156 root cwd unknown >> /proc/156/cwd (readlink: Permission denied) >> kondemand 156 root rtd unknown >> /proc/156/root (readlink: Permission denied) >> kondemand 156 root txt unknown >> /proc/156/exe (readlink: Permission denied) >> kondemand 156 root NOFD >> /proc/156/fd (opendir: Permission denied) >> kondemand 157 root cwd unknown >> /proc/157/cwd (readlink: Permission denied) >> kondemand 157 root rtd unknown >> /proc/157/root (readlink: Permission denied) >> kondemand 157 root txt unknown >> /proc/157/exe (readlink: Permission denied) >> kondemand 157 root NOFD >> /proc/157/fd (opendir: Permission denied) >> kondemand 158 root cwd unknown >> /proc/158/cwd (readlink: Permission denied) >> kondemand 158 root rtd unknown >> /proc/158/root (readlink: Permission denied) >> kondemand 158 root txt unknown >> /proc/158/exe (readlink: Permission denied) >> >> Also, the following couple of descriptors is opened several times at >> the same time: >> >> bash 20176 dem mem REG 252,0 256316 1179925 >> /usr/lib/locale/en_US.utf8/LC_CTYPE >> bash 20176 dem mem REG 252,0 54 1179926 >> /usr/lib/locale/en_US.utf8/LC_NUMERIC >> bash 20176 dem mem REG 252,0 2454 1179927 >> /usr/lib/locale/en_US.utf8/LC_TIME >> bash 20176 dem mem REG 252,0 966938 1179928 >> /usr/lib/locale/en_US.utf8/LC_COLLATE >> bash 20176 dem mem REG 252,0 286 1179929 >> /usr/lib/locale/en_US.utf8/LC_MONETARY >> bash 20176 dem mem REG 252,0 52 1179930 >> /usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES >> bash 20176 dem mem REG 252,0 34 1179931 >> /usr/lib/locale/en_US.utf8/LC_PAPER >> bash 20176 dem mem REG 252,0 77 1179932 >> /usr/lib/locale/en_US.utf8/LC_NAME >> bash 20176 dem mem REG 252,0 155 1179933 >> /usr/lib/locale/en_US.utf8/LC_ADDRESS >> bash 20176 dem mem REG 252,0 59 1179934 >> /usr/lib/locale/en_US.utf8/LC_TELEPHONE >> bash 20176 dem mem REG 252,0 23 1179935 >> /usr/lib/locale/en_US.utf8/LC_MEASUREMENT >> bash 20176 dem mem REG 252,0 26048 917676 >> /usr/lib/gconv/gconv-modules.cache >> bash 20176 dem mem REG 252,0 373 1179936 >> /usr/lib/locale/en_US.utf8/LC_IDENTIFICATION >> >> Version of Riak is 0.12.1. There was a similar problem once and the >> user was advised to make sure to use 0.12.1 >> >> Any ideas? >> >> -- >> Best regards, >> Dmitry Demeshchuk >> > > > > -- > Best regards, > Dmitry Demeshchuk > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _
Re: File descriptor leaks?
Hi, Dmitry. On Mon, Oct 18, 2010 at 11:07 PM, Dmitry Demeshchuk wrote: > We are using 0.12.1. There was indeed a file descriptor leak in that version of Riak, fixed between then and the 0.13 release. I hadn't seen any situations which were causing it to take effect nearly as quickly as you're describing, but nonetheless an upgrade should get rid of the problem. Best, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: RiakSearch Backend Innostore ?
On Sat, Oct 30, 2010 at 11:51 AM, Prometheus wrote: > Can we use Innostore for RiakSearch ? what is the performance comparison for > search backends ? Any information will be valuable. That depends on whether you mean the actual Search index backend, or the KV backend used for storing complete analyzed documents. There is currently only one backend (merge_index) that works under Riak Search, but one could certainly swap out the KV part's backend if bitcask wasn't a good fit for that part. The relative performance characteristics of bitcask and innostore under Riak KV are well known in general, but no top-to-bottom testing of Riak Search that I know of has been performed using innostore as the KV backend. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak and Locks
Hello, Neville. On Mon, Nov 8, 2010 at 10:35 PM, Neville Burnell wrote: > Are there any plans for a Distributed Lock Service for Riak, to allow for > apps that *need* locking for some KV ? It has been discussed and agreed that it would be interesting, but there is nothing currently being developed in the short term to provide this service integrally to Riak. If you application needs locking, some part of it other than Riak will need to provide that functionality. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Understanding Riaks rebalancing and handoff behaviour
On Tue, Nov 9, 2010 at 10:30 AM, Alexander Sicular wrote: > Mainly, I'm of the impression that you should join/leave a cluster one > node at a time. This impression is correct. I believe that in the not-too-distant future a feature may be added to enable stable addition of many nodes at once, but at this time the right approach is to add a node, allow the ringstate to stabilize through gossip, then repeat as needed. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How could we test/simulate siblings?
Hi, Cagdas. On Fri, Nov 12, 2010 at 8:17 PM, Cagdas Tulek wrote: > What is the best way of creating sibling records to see if my logic is > handling them correctly? Ensure that allow_mult is set to true. Create some object B/K. Get that object. It will come with some vector clock V. Put some new value X to B/K, using vector clock V in the put request. Put some new value Y to B/K, using vector clock V in the put request. (different value but same vclock as the previous put) Get B/K. You should get multiple values and some vclock V1. If you wish to resolve back to a single value, store some new value Z using vclock V1. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak won't die all the way on OS X
Jon, You can just leave empd running. That is standard erlang runtime behavior and generally won't cause any problems. -Justin On Tue, Nov 30, 2010 at 9:59 AM, Jon Brisbin wrote: > I'm running the pre-built binaries for Riak 0.13 (and 0.12 x64, for that > matter) for OS X 10.6. > When I do a "riak stop", there is one process still running. The epmd > -daemon process. I have to kill it manually. > In my testing, I'm starting 0.13, running a test, then shutting it down, > starting 0.12 and running another test. If I'm not switching versions, then > I just leave it running. > Will this cause a problem if I restart the server and leave this last > process running? What about if I switch from 0.13 to 0.12 (or vice versa)? > Will it interfere with anything? Do I even need to kill it? > > Thanks! > J. Brisbin > http://jbrisbin.com/ > > > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Storing relationship data
Hi, Bryan. The link data is embedded in the riak_object metadata, so you can easily observe it from outside Riak even when not performing link-walking queries. To see this in action, check out the "Link" headers when using the HTTP interface. -Justin On Thu, Dec 30, 2010 at 6:36 PM, Bryan Nagle wrote: > Hi, > For the project that I am currently working on, we are trying to decide on > the best way to store relational data in riak. At first glance, the obvious > choice looks to be links, however one of our requirements is that this > relationship information has to be sent to a client along with actual data. > The client has to be able to map the relationships between the > data solely from the information it receives while being completely outside > of Riak. > So, I was wondering if anyone had any suggestions? We are considering > either encapsulating our relationship data within the riak store itself (in > the value tied to the key), or using riak links. However, if we use riak > links, then we have to convert those links into data that the client can > receive & understand when sending data, and then convert the data we get > back from the client into riak links; we are wondering if this extra step > is worth implementing this kind of a translation. > Bryan Nagle > Liquid Analytics > bryan.na...@liquidanalytics.com > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: allow_multi VS HTTP Conditional PUT
Hi, Eric. On Mon, Jan 3, 2011 at 1:09 AM, Eric Moritz wrote: > Hi I just read "Why Vector Clocks are Easy". I am having trouble > seeing the advantage of letting a stale PUT into production and merge > afterwards vs HTTP's Conditional PUT, which never let's a stale PUT > into production. This is an excellent question, and one that we could discuss for some time. I am a big fan of HTTP conditional requests, but they are not always compatible with the other operational needs imposed in the interest of availability. The main issue is that Riak's approach is designed for a highly-available distributed system on the server side, while a standard HTTP conditional PUT mostly makes sense for single-writer (or at least single-leader) servers. Riak is designed to accept requests even when arbitrary nodes are down or unable to talk to each other. Achieving that availability goal is in conflict with the typical expectations around conditional PUT, which are basically those of an atomic CAS operation. Since not all nodes that might hold a copy of some given data might be reached during a write request, Riak cannot maintain its intended level of availability and simultaneously ensure that you are really only overwriting exactly the version that you specify. I hope that this sheds some light on why we have made the choices that you see in Riak. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: PDF or OpenOffice Impresse presentations instead of .KEY + Windows question
Hello, Jérôme. It looks like Jeremiah has already answered one of your questions, so I'll get the other. On Wed, Jan 5, 2011 at 6:41 PM, Jérôme Verstrynge wrote: > My other question/remark is: there does not seem to be a downloadable > version of Riak for Windows. Is there a technical reason for this or is it a > 'religious' issue? There's certainly no operating system religion at work here, simply the limited resources of a small team. A few different people in the community have been working on Windows support, which we think is a great idea -- we just currently don't have anyone spare to make official windows releases, QA and benchmark those releases on various windows systems, and so on. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Storing relationship data
Hi, Bryan. On Thu, Dec 30, 2010 at 8:02 PM, Bryan Nagle wrote: > Our current setup, is we are using webmachine; Client connects to > webmachine, and webmachine connects to riak via the erlang pcb client. So, > if we use links, and we want the client to be aware of the relationships, we > would still have to translate the links into the http response from > webmachine back to the client; or am I missing something? You are correct with regard to standard behavior -- if there is a layer between the client application and Riak, and you wish for the actual links (as opposed to just the ability to traverse them) to be visible to that client, then the intermediary must pass along the link data as well. There is a rarely-used alternative that might suit the scenario you described if you find it too annoying to carry the metadata to your client: you could set the "linkfun" bucket property to examine the content of objects instead of metadata, and define your own custom link serialization format to match your custom link storage format. This would allow you to embed the links directly inside the objects and still have mapreduce link queries work, but might break some other things such as the HTTP interface to link walking. I don't generally recommend this path, as you'd be well outside the realm of "normal" usage, but it is possible. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Getting all the Keys
On Sat, Jan 22, 2011 at 3:18 PM, Alexander Sicular wrote: > I'll drop a phat tangent and just mention that I watched @rk's talk at Qcon > SF 2010 the other day and am kinda crushing on how they implemented > distributed counters in cassandra (mainlined in 0.7.1 me thinks) which, > imho, is so choice for a riak implementation it isn't even funny. It was > like pow pow in da face and my face got melted. I know that a couple of people have done their own spikes on distributed counters for Riak and have demonstrated that it's certainly doable. The question isn't "can it be done" as we know it can. The tricky questions are about which tradeoffs to make: write-performance, read-performance, and so on. In other words, I am in support of this sort of feature. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com