About VM resources: I had suspected there would be a hit in this sense, but I wasn't aware of any actual numbers. Thanks for that
About 3 writes / the need for 2 more nodes: I had no idea about that (x# of writes per object). Riak is very unfamiliar territory for me. I'll read the guide that has been suggested and look at running my tests under a more "optimal" set up. Thanks to everyone who responded to this issue. If I get the numbers increased, I'll be sure to post a followup (in case anyone cares). On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson <zapph...@gmail.com> wrote: > I'm using the defaults for the python library, so that would be the HTTP > Rest interface. There is support for the PBC interface, which I'm looking > into using now. > > I had suspected that since I wasn't really using Riak in such a way as to > let it shine (ie, in a cluster of nodes), that might be part of my problem. > > Thanks so much for the detailed response. > > On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen < > greg.steffen...@gmail.com> wrote: > >> This is due to two factors: >> >> 1) Durability. MongoDB stores writes in RAM and flushes them to disk >> periodically (by default, every 60 seconds, according to this page: >> http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means >> that its writes can seem very, very fast, but if the machine goes down, you >> could lose up to 60 seconds of data. Riak writes don't return until the >> data has actually been persisted to disk. Casandra takes the same approach >> as MongoDB, with the same trade-off. >> >> 2) Parallelism. This test isn't taking advantage of Riak's distributed >> nature. Riak really shines when its run on a cluster of machines- you can >> make your write throughput almost arbitrarily fast, as long as you're >> willing add enough machines to the cluster. >> >> I doubt that you'll be able to get single-node Riak to write as fast as >> Mongo, but I'd guess that that numbers will get a little closer if you do >> several writes simultaneously in both by multi-threading using python's >> threading module. Also, be sure that you're using Riak's protocol buffers >> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I >> believe the python client supports both. >> >> Greg >> >> >> >> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson <zapph...@gmail.com>wrote: >> >>> Hello, >>> >>> I've recently started to explore using Riak (v0.13.0-2) from Python >>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm >>> unsure of the true origin of, and would like some input from users who have >>> been working with Riak and its Python drivers. >>> >>> I have 2 tests set up, one for Riak and another for MongoDB, both using >>> their respectively provided Python drivers. I'm constructing chunks of JSON >>> data consisting of a Person, who has an Address, and a purchase history >>> which contains 1 to 20 line items with some data about the item name, cost, >>> # puchased, etc. A very simple mockup of a purchase history. It does this >>> for 1 million "people" (my initial goal was to see how lookups fared when >>> you reach 1m+ records) >>> >>> When using MongoDB, the speed of inserts is incredibly fast. When using >>> Riak, however, there is a very noticeable lag after each insert. So much so >>> that when running side by side, the MongoDB test breaks into the 10,000s >>> before Riak hits it's first 1k. >>> >>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm >>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this >>> VM, I have Riak and MongoDB running concurrently. >>> >>> Here is a sample of how I'm using the Riak driver: >>> >>> riak_conn = RiakClient() >>> bucket = riak_conn.bucket("peopledb") >>> for i in range(1,1000000): >>> try: >>> new_obj = bucket.new("p" + str(i),MakePerson()) >>> new_obj.store(return_body=False) >>> except Exception as e: >>> print e >>> >>> I'm wondering if there is something blatantly wrong I'm doing. I didn't >>> see any kind of batch-store method on the bucket (instead of calling store >>> on each object, simply persist the entirety of the bucket itself), and I >>> wasn't sure if this was an issue with my particular setup (maybe the >>> specifics of my VM are somehow throttling its performance), or maybe just a >>> known limitation that I wasn't aware of. >>> >>> To shed some light on the disparity, I re factored my persistence into >>> separate methods, and used a wrapper to pull out the execution times. Here >>> is a very condensed list of run times. The method in question, for both >>> datastores, simply creates a new "Person" and stores it. Nothing else. >>> >>> MakeRiakPerson took 40.139 ms >>> MakeRiakPerson took 40.472 ms >>> MakeRiakPerson took 40.651 ms >>> MakeRiakPerson took 51.630 ms >>> MakeRiakPerson took 36.733 ms >>> >>> MakeMongoPerson took 1.810 ms >>> MakeMongoPerson took 3.619 ms >>> MakeMongoPerson took 1.036 ms >>> MakeMongoPerson took 1.275 ms >>> MakeMongoPerson took 3.656 ms >>> >>> Thankyou in advance for any help that can be offered here. I'm incredibly >>> new to Riak as a whole, as well as very inexperienced when it comes to >>> working in a *nix environment, so I imagine there are countless ways I could >>> have shot myself in the foot without realizing it. >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com