Re: Python Driver Write Times

Alexander Sicular Sun, 28 Nov 2010 09:33:23 -0800

Keep in mind that if you are using the standard riak install each of your 
writes are actually written three times to disk. The default configuration is 
an n_val of 3 for all buckets. For this reason I usually change the default 
config file or change the n_val of the specific bucket i'm using for testing.


-Alexander Sicular

@siculars

On Nov 28, 2010, at 12:20 PM, Jeremiah Peschka wrote:

> To add to this, you won't see write speeds as fast as people have reported in 
> a variety of benchmarks because of I/O subsystem virtualization. You take a 
> 10-15% performance hit with virtualized disk when using a pure hypervisor 
> like VMWare ESX. Depending on your VM software, you could be taking a much 
> larger hit to your disk performance.
> 
> As Greg said, though, the big reasons for this difference is that MongoDB 
> caches writes in memory until the next file system sync interval and the fact 
> that Riak is designed to be run in a cluster of at least 3 machines 
> (preferably more).
> 
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
> 
> 
> 
> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen <greg.steffen...@gmail.com> 
> wrote:
> This is due to two factors:
> 
> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk 
> periodically (by default, every 60 seconds, according to this page: 
> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means that 
> its writes can seem very, very fast, but if the machine goes down, you could 
> lose up to 60 seconds of data.  Riak writes don't return until the data has 
> actually been persisted to disk.  Casandra takes the same approach as 
> MongoDB, with the same trade-off.  
> 
> 2) Parallelism.  This test isn't taking advantage of Riak's distributed 
> nature.  Riak really shines when its run on a cluster of machines- you can 
> make your write throughput almost arbitrarily fast, as long as you're willing 
> add enough machines to the cluster.  
> 
> I doubt that you'll be able to get single-node Riak to write as fast as 
> Mongo, but I'd guess that that numbers will get a little closer if you do 
> several writes simultaneously in both by multi-threading using python's 
> threading module.  Also, be sure that you're using Riak's protocol buffers 
> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I 
> believe the python client supports both.  
> 
> Greg
> 
> 
> 
> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson <zapph...@gmail.com> wrote:
> Hello,
> 
> I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) 
> as a datastore, and I've run into a performance issue that I'm unsure of the 
> true origin of, and would like some input from users who have been working 
> with Riak and its Python drivers.
> 
> I have 2 tests set up, one for Riak and another for MongoDB, both using their 
> respectively provided Python drivers. I'm constructing chunks of JSON data 
> consisting of a Person, who has an Address, and a purchase history which 
> contains 1 to 20 line items with some data about the item name, cost, # 
> puchased, etc. A very simple mockup of a purchase history. It does this for 1 
> million "people" (my initial goal was to see how lookups fared when you reach 
> 1m+ records)
> 
> When using MongoDB, the speed of inserts is incredibly fast. When using Riak, 
> however, there is a very noticeable lag after each insert. So much so that 
> when running side by side, the MongoDB test breaks into the 10,000s before 
> Riak hits it's first 1k.
> 
> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm 
> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this 
> VM, I have Riak and MongoDB running concurrently.
> 
> Here is a sample of how I'm using the Riak driver:
> 
>     riak_conn = RiakClient()
>     bucket = riak_conn.bucket("peopledb")
>     for i in range(1,1000000):
>         try:
>             new_obj = bucket.new("p" + str(i),MakePerson())
>             new_obj.store(return_body=False)
>         except Exception as e:
>             print e
> 
> I'm wondering if there is something blatantly wrong I'm doing. I didn't see 
> any kind of batch-store method on the bucket (instead of calling store on 
> each object, simply persist the entirety of the bucket itself), and I wasn't 
> sure if this was an issue with my particular setup (maybe the specifics of my 
> VM are somehow throttling its performance), or maybe just a known limitation 
> that I wasn't aware of.
> 
> To shed some light on the disparity, I re factored my persistence into 
> separate methods, and used a wrapper to pull out the execution times. Here is 
> a very condensed list of run times. The method in question, for both 
> datastores, simply creates a new "Person" and stores it. Nothing else.
> 
> MakeRiakPerson took 40.139 ms
> MakeRiakPerson took 40.472 ms
> MakeRiakPerson took 40.651 ms
> MakeRiakPerson took 51.630 ms
> MakeRiakPerson took 36.733 ms
> 
> MakeMongoPerson took 1.810 ms
> MakeMongoPerson took 3.619 ms
> MakeMongoPerson took 1.036 ms
> MakeMongoPerson took 1.275 ms
> MakeMongoPerson took 3.656 ms
> 
> Thankyou in advance for any help that can be offered here. I'm incredibly new 
> to Riak as a whole, as well as very inexperienced when it comes to working in 
> a *nix environment, so I imagine there are countless ways I could have shot 
> myself in the foot without realizing it. 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Python Driver Write Times

Reply via email to