Keep in mind that if you are using the standard riak install each of your writes are actually written three times to disk. The default configuration is an n_val of 3 for all buckets. For this reason I usually change the default config file or change the n_val of the specific bucket i'm using for testing.
-Alexander Sicular @siculars On Nov 28, 2010, at 12:20 PM, Jeremiah Peschka wrote: > To add to this, you won't see write speeds as fast as people have reported in > a variety of benchmarks because of I/O subsystem virtualization. You take a > 10-15% performance hit with virtualized disk when using a pure hypervisor > like VMWare ESX. Depending on your VM software, you could be taking a much > larger hit to your disk performance. > > As Greg said, though, the big reasons for this difference is that MongoDB > caches writes in memory until the next file system sync interval and the fact > that Riak is designed to be run in a cluster of at least 3 machines > (preferably more). > > Jeremiah Peschka > Microsoft SQL Server MVP > MCITP: Database Developer, DBA > > > > On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen <greg.steffen...@gmail.com> > wrote: > This is due to two factors: > > 1) Durability. MongoDB stores writes in RAM and flushes them to disk > periodically (by default, every 60 seconds, according to this page: > http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means that > its writes can seem very, very fast, but if the machine goes down, you could > lose up to 60 seconds of data. Riak writes don't return until the data has > actually been persisted to disk. Casandra takes the same approach as > MongoDB, with the same trade-off. > > 2) Parallelism. This test isn't taking advantage of Riak's distributed > nature. Riak really shines when its run on a cluster of machines- you can > make your write throughput almost arbitrarily fast, as long as you're willing > add enough machines to the cluster. > > I doubt that you'll be able to get single-node Riak to write as fast as > Mongo, but I'd guess that that numbers will get a little closer if you do > several writes simultaneously in both by multi-threading using python's > threading module. Also, be sure that you're using Riak's protocol buffers > interface, instead of the REST (HTTP) one, which adds a lot of overhead- I > believe the python client supports both. > > Greg > > > > On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson <zapph...@gmail.com> wrote: > Hello, > > I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) > as a datastore, and I've run into a performance issue that I'm unsure of the > true origin of, and would like some input from users who have been working > with Riak and its Python drivers. > > I have 2 tests set up, one for Riak and another for MongoDB, both using their > respectively provided Python drivers. I'm constructing chunks of JSON data > consisting of a Person, who has an Address, and a purchase history which > contains 1 to 20 line items with some data about the item name, cost, # > puchased, etc. A very simple mockup of a purchase history. It does this for 1 > million "people" (my initial goal was to see how lookups fared when you reach > 1m+ records) > > When using MongoDB, the speed of inserts is incredibly fast. When using Riak, > however, there is a very noticeable lag after each insert. So much so that > when running side by side, the MongoDB test breaks into the 10,000s before > Riak hits it's first 1k. > > My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm > running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this > VM, I have Riak and MongoDB running concurrently. > > Here is a sample of how I'm using the Riak driver: > > riak_conn = RiakClient() > bucket = riak_conn.bucket("peopledb") > for i in range(1,1000000): > try: > new_obj = bucket.new("p" + str(i),MakePerson()) > new_obj.store(return_body=False) > except Exception as e: > print e > > I'm wondering if there is something blatantly wrong I'm doing. I didn't see > any kind of batch-store method on the bucket (instead of calling store on > each object, simply persist the entirety of the bucket itself), and I wasn't > sure if this was an issue with my particular setup (maybe the specifics of my > VM are somehow throttling its performance), or maybe just a known limitation > that I wasn't aware of. > > To shed some light on the disparity, I re factored my persistence into > separate methods, and used a wrapper to pull out the execution times. Here is > a very condensed list of run times. The method in question, for both > datastores, simply creates a new "Person" and stores it. Nothing else. > > MakeRiakPerson took 40.139 ms > MakeRiakPerson took 40.472 ms > MakeRiakPerson took 40.651 ms > MakeRiakPerson took 51.630 ms > MakeRiakPerson took 36.733 ms > > MakeMongoPerson took 1.810 ms > MakeMongoPerson took 3.619 ms > MakeMongoPerson took 1.036 ms > MakeMongoPerson took 1.275 ms > MakeMongoPerson took 3.656 ms > > Thankyou in advance for any help that can be offered here. I'm incredibly new > to Riak as a whole, as well as very inexperienced when it comes to working in > a *nix environment, so I imagine there are countless ways I could have shot > myself in the foot without realizing it. > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com