Re: Python Driver Write Times

Derek Sanderson Sun, 28 Nov 2010 12:13:11 -0800

About VM resources: I had suspected there would be a hit in this sense, but
I wasn't aware of any actual numbers. Thanks for that

About 3 writes / the need for 2 more nodes: I had no idea about that (x# of
writes per object). Riak is very unfamiliar territory for me. I'll read the
guide that has been suggested and look at running my tests under a more
"optimal" set up.

Thanks to everyone who responded to this issue. If I get the numbers
increased, I'll be sure to post a followup (in case anyone cares).

On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson <zapph...@gmail.com> wrote:

> I'm using the defaults for the python library, so that would be the HTTP
> Rest interface. There is support for the PBC interface, which I'm looking
> into using now.
>
> I had suspected that since I wasn't really using Riak in such a way as to
> let it shine (ie, in a cluster of nodes), that might be part of my problem.
>
> Thanks so much for the detailed response.
>
> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen <
> greg.steffen...@gmail.com> wrote:
>
>> This is due to two factors:
>>
>> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk
>> periodically (by default, every 60 seconds, according to this page:
>> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means
>> that its writes can seem very, very fast, but if the machine goes down, you
>> could lose up to 60 seconds of data.  Riak writes don't return until the
>> data has actually been persisted to disk.  Casandra takes the same approach
>> as MongoDB, with the same trade-off.
>>
>> 2) Parallelism.  This test isn't taking advantage of Riak's distributed
>> nature.  Riak really shines when its run on a cluster of machines- you can
>> make your write throughput almost arbitrarily fast, as long as you're
>> willing add enough machines to the cluster.
>>
>> I doubt that you'll be able to get single-node Riak to write as fast as
>> Mongo, but I'd guess that that numbers will get a little closer if you do
>> several writes simultaneously in both by multi-threading using python's
>> threading module.  Also, be sure that you're using Riak's protocol buffers
>> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
>> believe the python client supports both.
>>
>> Greg
>>
>>
>>
>> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson <zapph...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I've recently started to explore using Riak (v0.13.0-2) from Python
>>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
>>> unsure of the true origin of, and would like some input from users who have
>>> been working with Riak and its Python drivers.
>>>
>>> I have 2 tests set up, one for Riak and another for MongoDB, both using
>>> their respectively provided Python drivers. I'm constructing chunks of JSON
>>> data consisting of a Person, who has an Address, and a purchase history
>>> which contains 1 to 20 line items with some data about the item name, cost,
>>> # puchased, etc. A very simple mockup of a purchase history. It does this
>>> for 1 million "people" (my initial goal was to see how lookups fared when
>>> you reach 1m+ records)
>>>
>>> When using MongoDB, the speed of inserts is incredibly fast. When using
>>> Riak, however, there is a very noticeable lag after each insert. So much so
>>> that when running side by side, the MongoDB test breaks into the 10,000s
>>> before Riak hits it's first 1k.
>>>
>>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
>>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
>>> VM, I have Riak and MongoDB running concurrently.
>>>
>>> Here is a sample of how I'm using the Riak driver:
>>>
>>>     riak_conn = RiakClient()
>>>     bucket = riak_conn.bucket("peopledb")
>>>     for i in range(1,1000000):
>>>         try:
>>>             new_obj = bucket.new("p" + str(i),MakePerson())
>>>             new_obj.store(return_body=False)
>>>         except Exception as e:
>>>             print e
>>>
>>> I'm wondering if there is something blatantly wrong I'm doing. I didn't
>>> see any kind of batch-store method on the bucket (instead of calling store
>>> on each object, simply persist the entirety of the bucket itself), and I
>>> wasn't sure if this was an issue with my particular setup (maybe the
>>> specifics of my VM are somehow throttling its performance), or maybe just a
>>> known limitation that I wasn't aware of.
>>>
>>> To shed some light on the disparity, I re factored my persistence into
>>> separate methods, and used a wrapper to pull out the execution times. Here
>>> is a very condensed list of run times. The method in question, for both
>>> datastores, simply creates a new "Person" and stores it. Nothing else.
>>>
>>> MakeRiakPerson took 40.139 ms
>>> MakeRiakPerson took 40.472 ms
>>> MakeRiakPerson took 40.651 ms
>>> MakeRiakPerson took 51.630 ms
>>> MakeRiakPerson took 36.733 ms
>>>
>>> MakeMongoPerson took 1.810 ms
>>> MakeMongoPerson took 3.619 ms
>>> MakeMongoPerson took 1.036 ms
>>> MakeMongoPerson took 1.275 ms
>>> MakeMongoPerson took 3.656 ms
>>>
>>> Thankyou in advance for any help that can be offered here. I'm incredibly
>>> new to Riak as a whole, as well as very inexperienced when it comes to
>>> working in a *nix environment, so I imagine there are countless ways I could
>>> have shot myself in the foot without realizing it.
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Python Driver Write Times

Reply via email to