Question about Bitcask
Is there any way to use it without Erlang, just with C ? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Python Driver Write Times
Hello, I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) as a datastore, and I've run into a performance issue that I'm unsure of the true origin of, and would like some input from users who have been working with Riak and its Python drivers. I have 2 tests set up, one for Riak and another for MongoDB, both using their respectively provided Python drivers. I'm constructing chunks of JSON data consisting of a Person, who has an Address, and a purchase history which contains 1 to 20 line items with some data about the item name, cost, # puchased, etc. A very simple mockup of a purchase history. It does this for 1 million "people" (my initial goal was to see how lookups fared when you reach 1m+ records) When using MongoDB, the speed of inserts is incredibly fast. When using Riak, however, there is a very noticeable lag after each insert. So much so that when running side by side, the MongoDB test breaks into the 10,000s before Riak hits it's first 1k. My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this VM, I have Riak and MongoDB running concurrently. Here is a sample of how I'm using the Riak driver: riak_conn = RiakClient() bucket = riak_conn.bucket("peopledb") for i in range(1,100): try: new_obj = bucket.new("p" + str(i),MakePerson()) new_obj.store(return_body=False) except Exception as e: print e I'm wondering if there is something blatantly wrong I'm doing. I didn't see any kind of batch-store method on the bucket (instead of calling store on each object, simply persist the entirety of the bucket itself), and I wasn't sure if this was an issue with my particular setup (maybe the specifics of my VM are somehow throttling its performance), or maybe just a known limitation that I wasn't aware of. To shed some light on the disparity, I re factored my persistence into separate methods, and used a wrapper to pull out the execution times. Here is a very condensed list of run times. The method in question, for both datastores, simply creates a new "Person" and stores it. Nothing else. MakeRiakPerson took 40.139 ms MakeRiakPerson took 40.472 ms MakeRiakPerson took 40.651 ms MakeRiakPerson took 51.630 ms MakeRiakPerson took 36.733 ms MakeMongoPerson took 1.810 ms MakeMongoPerson took 3.619 ms MakeMongoPerson took 1.036 ms MakeMongoPerson took 1.275 ms MakeMongoPerson took 3.656 ms Thankyou in advance for any help that can be offered here. I'm incredibly new to Riak as a whole, as well as very inexperienced when it comes to working in a *nix environment, so I imagine there are countless ways I could have shot myself in the foot without realizing it. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python Driver Write Times
This is due to two factors: 1) Durability. MongoDB stores writes in RAM and flushes them to disk periodically (by default, every 60 seconds, according to this page: http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means that its writes can seem very, very fast, but if the machine goes down, you could lose up to 60 seconds of data. Riak writes don't return until the data has actually been persisted to disk. Casandra takes the same approach as MongoDB, with the same trade-off. 2) Parallelism. This test isn't taking advantage of Riak's distributed nature. Riak really shines when its run on a cluster of machines- you can make your write throughput almost arbitrarily fast, as long as you're willing add enough machines to the cluster. I doubt that you'll be able to get single-node Riak to write as fast as Mongo, but I'd guess that that numbers will get a little closer if you do several writes simultaneously in both by multi-threading using python's threading module. Also, be sure that you're using Riak's protocol buffers interface, instead of the REST (HTTP) one, which adds a lot of overhead- I believe the python client supports both. Greg On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: > Hello, > > I've recently started to explore using Riak (v0.13.0-2) from Python > (v2.6.5) as a datastore, and I've run into a performance issue that I'm > unsure of the true origin of, and would like some input from users who have > been working with Riak and its Python drivers. > > I have 2 tests set up, one for Riak and another for MongoDB, both using > their respectively provided Python drivers. I'm constructing chunks of JSON > data consisting of a Person, who has an Address, and a purchase history > which contains 1 to 20 line items with some data about the item name, cost, > # puchased, etc. A very simple mockup of a purchase history. It does this > for 1 million "people" (my initial goal was to see how lookups fared when > you reach 1m+ records) > > When using MongoDB, the speed of inserts is incredibly fast. When using > Riak, however, there is a very noticeable lag after each insert. So much so > that when running side by side, the MongoDB test breaks into the 10,000s > before Riak hits it's first 1k. > > My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm > running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this > VM, I have Riak and MongoDB running concurrently. > > Here is a sample of how I'm using the Riak driver: > > riak_conn = RiakClient() > bucket = riak_conn.bucket("peopledb") > for i in range(1,100): > try: > new_obj = bucket.new("p" + str(i),MakePerson()) > new_obj.store(return_body=False) > except Exception as e: > print e > > I'm wondering if there is something blatantly wrong I'm doing. I didn't see > any kind of batch-store method on the bucket (instead of calling store on > each object, simply persist the entirety of the bucket itself), and I wasn't > sure if this was an issue with my particular setup (maybe the specifics of > my VM are somehow throttling its performance), or maybe just a known > limitation that I wasn't aware of. > > To shed some light on the disparity, I re factored my persistence into > separate methods, and used a wrapper to pull out the execution times. Here > is a very condensed list of run times. The method in question, for both > datastores, simply creates a new "Person" and stores it. Nothing else. > > MakeRiakPerson took 40.139 ms > MakeRiakPerson took 40.472 ms > MakeRiakPerson took 40.651 ms > MakeRiakPerson took 51.630 ms > MakeRiakPerson took 36.733 ms > > MakeMongoPerson took 1.810 ms > MakeMongoPerson took 3.619 ms > MakeMongoPerson took 1.036 ms > MakeMongoPerson took 1.275 ms > MakeMongoPerson took 3.656 ms > > Thankyou in advance for any help that can be offered here. I'm incredibly > new to Riak as a whole, as well as very inexperienced when it comes to > working in a *nix environment, so I imagine there are countless ways I could > have shot myself in the foot without realizing it. > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python Driver Write Times
To add to this, you won't see write speeds as fast as people have reported in a variety of benchmarks because of I/O subsystem virtualization. You take a 10-15% performance hit with virtualized disk when using a pure hypervisor like VMWare ESX. Depending on your VM software, you could be taking a much larger hit to your disk performance. As Greg said, though, the big reasons for this difference is that MongoDB caches writes in memory until the next file system sync interval and the fact that Riak is designed to be run in a cluster of at least 3 machines (preferably more). Jeremiah Peschka Microsoft SQL Server MVP MCITP: Database Developer, DBA On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen wrote: > This is due to two factors: > > 1) Durability. MongoDB stores writes in RAM and flushes them to disk > periodically (by default, every 60 seconds, according to this page: > http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means > that its writes can seem very, very fast, but if the machine goes down, you > could lose up to 60 seconds of data. Riak writes don't return until the > data has actually been persisted to disk. Casandra takes the same approach > as MongoDB, with the same trade-off. > > 2) Parallelism. This test isn't taking advantage of Riak's distributed > nature. Riak really shines when its run on a cluster of machines- you can > make your write throughput almost arbitrarily fast, as long as you're > willing add enough machines to the cluster. > > I doubt that you'll be able to get single-node Riak to write as fast as > Mongo, but I'd guess that that numbers will get a little closer if you do > several writes simultaneously in both by multi-threading using python's > threading module. Also, be sure that you're using Riak's protocol buffers > interface, instead of the REST (HTTP) one, which adds a lot of overhead- I > believe the python client supports both. > > Greg > > > > On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: > >> Hello, >> >> I've recently started to explore using Riak (v0.13.0-2) from Python >> (v2.6.5) as a datastore, and I've run into a performance issue that I'm >> unsure of the true origin of, and would like some input from users who have >> been working with Riak and its Python drivers. >> >> I have 2 tests set up, one for Riak and another for MongoDB, both using >> their respectively provided Python drivers. I'm constructing chunks of JSON >> data consisting of a Person, who has an Address, and a purchase history >> which contains 1 to 20 line items with some data about the item name, cost, >> # puchased, etc. A very simple mockup of a purchase history. It does this >> for 1 million "people" (my initial goal was to see how lookups fared when >> you reach 1m+ records) >> >> When using MongoDB, the speed of inserts is incredibly fast. When using >> Riak, however, there is a very noticeable lag after each insert. So much so >> that when running side by side, the MongoDB test breaks into the 10,000s >> before Riak hits it's first 1k. >> >> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm >> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this >> VM, I have Riak and MongoDB running concurrently. >> >> Here is a sample of how I'm using the Riak driver: >> >> riak_conn = RiakClient() >> bucket = riak_conn.bucket("peopledb") >> for i in range(1,100): >> try: >> new_obj = bucket.new("p" + str(i),MakePerson()) >> new_obj.store(return_body=False) >> except Exception as e: >> print e >> >> I'm wondering if there is something blatantly wrong I'm doing. I didn't >> see any kind of batch-store method on the bucket (instead of calling store >> on each object, simply persist the entirety of the bucket itself), and I >> wasn't sure if this was an issue with my particular setup (maybe the >> specifics of my VM are somehow throttling its performance), or maybe just a >> known limitation that I wasn't aware of. >> >> To shed some light on the disparity, I re factored my persistence into >> separate methods, and used a wrapper to pull out the execution times. Here >> is a very condensed list of run times. The method in question, for both >> datastores, simply creates a new "Person" and stores it. Nothing else. >> >> MakeRiakPerson took 40.139 ms >> MakeRiakPerson took 40.472 ms >> MakeRiakPerson took 40.651 ms >> MakeRiakPerson took 51.630 ms >> MakeRiakPerson took 36.733 ms >> >> MakeMongoPerson took 1.810 ms >> MakeMongoPerson took 3.619 ms >> MakeMongoPerson took 1.036 ms >> MakeMongoPerson took 1.275 ms >> MakeMongoPerson took 3.656 ms >> >> Thankyou in advance for any help that can be offered here. I'm incredibly >> new to Riak as a whole, as well as very inexperienced when it comes to >> working in a *nix environment, so I imagine there are countless ways I could >> have shot myself in the foot without realizing it. >> >> ___
Re: Python Driver Write Times
Keep in mind that if you are using the standard riak install each of your writes are actually written three times to disk. The default configuration is an n_val of 3 for all buckets. For this reason I usually change the default config file or change the n_val of the specific bucket i'm using for testing. -Alexander Sicular @siculars On Nov 28, 2010, at 12:20 PM, Jeremiah Peschka wrote: > To add to this, you won't see write speeds as fast as people have reported in > a variety of benchmarks because of I/O subsystem virtualization. You take a > 10-15% performance hit with virtualized disk when using a pure hypervisor > like VMWare ESX. Depending on your VM software, you could be taking a much > larger hit to your disk performance. > > As Greg said, though, the big reasons for this difference is that MongoDB > caches writes in memory until the next file system sync interval and the fact > that Riak is designed to be run in a cluster of at least 3 machines > (preferably more). > > Jeremiah Peschka > Microsoft SQL Server MVP > MCITP: Database Developer, DBA > > > > On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen > wrote: > This is due to two factors: > > 1) Durability. MongoDB stores writes in RAM and flushes them to disk > periodically (by default, every 60 seconds, according to this page: > http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means that > its writes can seem very, very fast, but if the machine goes down, you could > lose up to 60 seconds of data. Riak writes don't return until the data has > actually been persisted to disk. Casandra takes the same approach as > MongoDB, with the same trade-off. > > 2) Parallelism. This test isn't taking advantage of Riak's distributed > nature. Riak really shines when its run on a cluster of machines- you can > make your write throughput almost arbitrarily fast, as long as you're willing > add enough machines to the cluster. > > I doubt that you'll be able to get single-node Riak to write as fast as > Mongo, but I'd guess that that numbers will get a little closer if you do > several writes simultaneously in both by multi-threading using python's > threading module. Also, be sure that you're using Riak's protocol buffers > interface, instead of the REST (HTTP) one, which adds a lot of overhead- I > believe the python client supports both. > > Greg > > > > On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: > Hello, > > I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) > as a datastore, and I've run into a performance issue that I'm unsure of the > true origin of, and would like some input from users who have been working > with Riak and its Python drivers. > > I have 2 tests set up, one for Riak and another for MongoDB, both using their > respectively provided Python drivers. I'm constructing chunks of JSON data > consisting of a Person, who has an Address, and a purchase history which > contains 1 to 20 line items with some data about the item name, cost, # > puchased, etc. A very simple mockup of a purchase history. It does this for 1 > million "people" (my initial goal was to see how lookups fared when you reach > 1m+ records) > > When using MongoDB, the speed of inserts is incredibly fast. When using Riak, > however, there is a very noticeable lag after each insert. So much so that > when running side by side, the MongoDB test breaks into the 10,000s before > Riak hits it's first 1k. > > My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm > running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this > VM, I have Riak and MongoDB running concurrently. > > Here is a sample of how I'm using the Riak driver: > > riak_conn = RiakClient() > bucket = riak_conn.bucket("peopledb") > for i in range(1,100): > try: > new_obj = bucket.new("p" + str(i),MakePerson()) > new_obj.store(return_body=False) > except Exception as e: > print e > > I'm wondering if there is something blatantly wrong I'm doing. I didn't see > any kind of batch-store method on the bucket (instead of calling store on > each object, simply persist the entirety of the bucket itself), and I wasn't > sure if this was an issue with my particular setup (maybe the specifics of my > VM are somehow throttling its performance), or maybe just a known limitation > that I wasn't aware of. > > To shed some light on the disparity, I re factored my persistence into > separate methods, and used a wrapper to pull out the execution times. Here is > a very condensed list of run times. The method in question, for both > datastores, simply creates a new "Person" and stores it. Nothing else. > > MakeRiakPerson took 40.139 ms > MakeRiakPerson took 40.472 ms > MakeRiakPerson took 40.651 ms > MakeRiakPerson took 51.630 ms > MakeRiakPerson took 36.733 ms > > MakeMongoPerson took 1.810 ms > Ma
Re: Python Driver Write Times
You're using the HTTP interface, which is slower than the Protocal Buffers interface. You should change your code from: riak_conn = RiakClient() bucket = riak_conn.bucket("peopledb") for i in range(1,100): try: new_obj = bucket.new("p" + str(i),MakePerson()) new_obj.store(return_body=False) except Exception as e: print e To: from riak import RiakPbcTransport riak_conn = RiakClient(port=8087, transport_class=riak.RiakPbcTransport) bucket = riak_conn.bucket("peopledb") for i in range(1,100): try: new_obj = bucket.new("p" + str(i),MakePerson()) new_obj.store(return_body=False) except Exception as e: print e In my testing, I saw a 200-300% speedup when switching to PB. As previously suggested, you should also spin up two more Riak nodes (making a cluster size of 3). See https://wiki.basho.com/display/RIAK/Basic+Cluster+Setup, specifically the "Add a Second Node to Your Cluster" section. Daniel ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Python Driver Write Times
I'm using the defaults for the python library, so that would be the HTTP Rest interface. There is support for the PBC interface, which I'm looking into using now. I had suspected that since I wasn't really using Riak in such a way as to let it shine (ie, in a cluster of nodes), that might be part of my problem. Thanks so much for the detailed response. On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen wrote: > This is due to two factors: > > 1) Durability. MongoDB stores writes in RAM and flushes them to disk > periodically (by default, every 60 seconds, according to this page: > http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means > that its writes can seem very, very fast, but if the machine goes down, you > could lose up to 60 seconds of data. Riak writes don't return until the > data has actually been persisted to disk. Casandra takes the same approach > as MongoDB, with the same trade-off. > > 2) Parallelism. This test isn't taking advantage of Riak's distributed > nature. Riak really shines when its run on a cluster of machines- you can > make your write throughput almost arbitrarily fast, as long as you're > willing add enough machines to the cluster. > > I doubt that you'll be able to get single-node Riak to write as fast as > Mongo, but I'd guess that that numbers will get a little closer if you do > several writes simultaneously in both by multi-threading using python's > threading module. Also, be sure that you're using Riak's protocol buffers > interface, instead of the REST (HTTP) one, which adds a lot of overhead- I > believe the python client supports both. > > Greg > > > > On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: > >> Hello, >> >> I've recently started to explore using Riak (v0.13.0-2) from Python >> (v2.6.5) as a datastore, and I've run into a performance issue that I'm >> unsure of the true origin of, and would like some input from users who have >> been working with Riak and its Python drivers. >> >> I have 2 tests set up, one for Riak and another for MongoDB, both using >> their respectively provided Python drivers. I'm constructing chunks of JSON >> data consisting of a Person, who has an Address, and a purchase history >> which contains 1 to 20 line items with some data about the item name, cost, >> # puchased, etc. A very simple mockup of a purchase history. It does this >> for 1 million "people" (my initial goal was to see how lookups fared when >> you reach 1m+ records) >> >> When using MongoDB, the speed of inserts is incredibly fast. When using >> Riak, however, there is a very noticeable lag after each insert. So much so >> that when running side by side, the MongoDB test breaks into the 10,000s >> before Riak hits it's first 1k. >> >> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm >> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this >> VM, I have Riak and MongoDB running concurrently. >> >> Here is a sample of how I'm using the Riak driver: >> >> riak_conn = RiakClient() >> bucket = riak_conn.bucket("peopledb") >> for i in range(1,100): >> try: >> new_obj = bucket.new("p" + str(i),MakePerson()) >> new_obj.store(return_body=False) >> except Exception as e: >> print e >> >> I'm wondering if there is something blatantly wrong I'm doing. I didn't >> see any kind of batch-store method on the bucket (instead of calling store >> on each object, simply persist the entirety of the bucket itself), and I >> wasn't sure if this was an issue with my particular setup (maybe the >> specifics of my VM are somehow throttling its performance), or maybe just a >> known limitation that I wasn't aware of. >> >> To shed some light on the disparity, I re factored my persistence into >> separate methods, and used a wrapper to pull out the execution times. Here >> is a very condensed list of run times. The method in question, for both >> datastores, simply creates a new "Person" and stores it. Nothing else. >> >> MakeRiakPerson took 40.139 ms >> MakeRiakPerson took 40.472 ms >> MakeRiakPerson took 40.651 ms >> MakeRiakPerson took 51.630 ms >> MakeRiakPerson took 36.733 ms >> >> MakeMongoPerson took 1.810 ms >> MakeMongoPerson took 3.619 ms >> MakeMongoPerson took 1.036 ms >> MakeMongoPerson took 1.275 ms >> MakeMongoPerson took 3.656 ms >> >> Thankyou in advance for any help that can be offered here. I'm incredibly >> new to Riak as a whole, as well as very inexperienced when it comes to >> working in a *nix environment, so I imagine there are countless ways I could >> have shot myself in the foot without realizing it. >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/l
Re: Python Driver Write Times
About VM resources: I had suspected there would be a hit in this sense, but I wasn't aware of any actual numbers. Thanks for that About 3 writes / the need for 2 more nodes: I had no idea about that (x# of writes per object). Riak is very unfamiliar territory for me. I'll read the guide that has been suggested and look at running my tests under a more "optimal" set up. Thanks to everyone who responded to this issue. If I get the numbers increased, I'll be sure to post a followup (in case anyone cares). On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson wrote: > I'm using the defaults for the python library, so that would be the HTTP > Rest interface. There is support for the PBC interface, which I'm looking > into using now. > > I had suspected that since I wasn't really using Riak in such a way as to > let it shine (ie, in a cluster of nodes), that might be part of my problem. > > Thanks so much for the detailed response. > > On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen < > greg.steffen...@gmail.com> wrote: > >> This is due to two factors: >> >> 1) Durability. MongoDB stores writes in RAM and flushes them to disk >> periodically (by default, every 60 seconds, according to this page: >> http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means >> that its writes can seem very, very fast, but if the machine goes down, you >> could lose up to 60 seconds of data. Riak writes don't return until the >> data has actually been persisted to disk. Casandra takes the same approach >> as MongoDB, with the same trade-off. >> >> 2) Parallelism. This test isn't taking advantage of Riak's distributed >> nature. Riak really shines when its run on a cluster of machines- you can >> make your write throughput almost arbitrarily fast, as long as you're >> willing add enough machines to the cluster. >> >> I doubt that you'll be able to get single-node Riak to write as fast as >> Mongo, but I'd guess that that numbers will get a little closer if you do >> several writes simultaneously in both by multi-threading using python's >> threading module. Also, be sure that you're using Riak's protocol buffers >> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I >> believe the python client supports both. >> >> Greg >> >> >> >> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: >> >>> Hello, >>> >>> I've recently started to explore using Riak (v0.13.0-2) from Python >>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm >>> unsure of the true origin of, and would like some input from users who have >>> been working with Riak and its Python drivers. >>> >>> I have 2 tests set up, one for Riak and another for MongoDB, both using >>> their respectively provided Python drivers. I'm constructing chunks of JSON >>> data consisting of a Person, who has an Address, and a purchase history >>> which contains 1 to 20 line items with some data about the item name, cost, >>> # puchased, etc. A very simple mockup of a purchase history. It does this >>> for 1 million "people" (my initial goal was to see how lookups fared when >>> you reach 1m+ records) >>> >>> When using MongoDB, the speed of inserts is incredibly fast. When using >>> Riak, however, there is a very noticeable lag after each insert. So much so >>> that when running side by side, the MongoDB test breaks into the 10,000s >>> before Riak hits it's first 1k. >>> >>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm >>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this >>> VM, I have Riak and MongoDB running concurrently. >>> >>> Here is a sample of how I'm using the Riak driver: >>> >>> riak_conn = RiakClient() >>> bucket = riak_conn.bucket("peopledb") >>> for i in range(1,100): >>> try: >>> new_obj = bucket.new("p" + str(i),MakePerson()) >>> new_obj.store(return_body=False) >>> except Exception as e: >>> print e >>> >>> I'm wondering if there is something blatantly wrong I'm doing. I didn't >>> see any kind of batch-store method on the bucket (instead of calling store >>> on each object, simply persist the entirety of the bucket itself), and I >>> wasn't sure if this was an issue with my particular setup (maybe the >>> specifics of my VM are somehow throttling its performance), or maybe just a >>> known limitation that I wasn't aware of. >>> >>> To shed some light on the disparity, I re factored my persistence into >>> separate methods, and used a wrapper to pull out the execution times. Here >>> is a very condensed list of run times. The method in question, for both >>> datastores, simply creates a new "Person" and stores it. Nothing else. >>> >>> MakeRiakPerson took 40.139 ms >>> MakeRiakPerson took 40.472 ms >>> MakeRiakPerson took 40.651 ms >>> MakeRiakPerson took 51.630 ms >>> MakeRiakPerson took 36.733 ms >>> >>> MakeMongoPerson took 1.810 ms >>> MakeMongoPerson took 3.619 ms >>> MakeMongoPe
Re: Python Driver Write Times
>From just the switch to the PBC Transport, I see this kind of increase: REST HTTP: MakeRiakPerson took 40.139 ms MakeRiakPerson took 40.472 ms MakeRiakPerson took 40.651 ms MakeRiakPerson took 51.630 ms MakeRiakPerson took 36.733 ms ProtoBuffers: MakeRiakPerson took 1.989 ms MakeRiakPerson took 2.650 ms MakeRiakPerson took 1.523 ms MakeRiakPerson took 3.707 ms MakeRiakPerson took 7.213 ms MakeRiakPerson took 3.141 ms That almost pushes it down in and of itself to the speeds I got with MongoDB. My next step will be moving to replicate an environment where I have atleast 3 nodes up and running. Thanks for the assistance, everyone. Again, thanks to everyone who lent a hand. You've made a Riak-Believer out of me. On Sun, Nov 28, 2010 at 3:12 PM, Derek Sanderson wrote: > About VM resources: I had suspected there would be a hit in this sense, but > I wasn't aware of any actual numbers. Thanks for that > > About 3 writes / the need for 2 more nodes: I had no idea about that (x# of > writes per object). Riak is very unfamiliar territory for me. I'll read the > guide that has been suggested and look at running my tests under a more > "optimal" set up. > > Thanks to everyone who responded to this issue. If I get the numbers > increased, I'll be sure to post a followup (in case anyone cares). > > > On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson wrote: > >> I'm using the defaults for the python library, so that would be the HTTP >> Rest interface. There is support for the PBC interface, which I'm looking >> into using now. >> >> I had suspected that since I wasn't really using Riak in such a way as to >> let it shine (ie, in a cluster of nodes), that might be part of my problem. >> >> Thanks so much for the detailed response. >> >> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen < >> greg.steffen...@gmail.com> wrote: >> >>> This is due to two factors: >>> >>> 1) Durability. MongoDB stores writes in RAM and flushes them to disk >>> periodically (by default, every 60 seconds, according to this page: >>> http://www.mongodb.org/display/DOCS/Durability+and+Repair). This means >>> that its writes can seem very, very fast, but if the machine goes down, you >>> could lose up to 60 seconds of data. Riak writes don't return until the >>> data has actually been persisted to disk. Casandra takes the same approach >>> as MongoDB, with the same trade-off. >>> >>> 2) Parallelism. This test isn't taking advantage of Riak's distributed >>> nature. Riak really shines when its run on a cluster of machines- you can >>> make your write throughput almost arbitrarily fast, as long as you're >>> willing add enough machines to the cluster. >>> >>> I doubt that you'll be able to get single-node Riak to write as fast as >>> Mongo, but I'd guess that that numbers will get a little closer if you do >>> several writes simultaneously in both by multi-threading using python's >>> threading module. Also, be sure that you're using Riak's protocol buffers >>> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I >>> believe the python client supports both. >>> >>> Greg >>> >>> >>> >>> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote: >>> Hello, I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) as a datastore, and I've run into a performance issue that I'm unsure of the true origin of, and would like some input from users who have been working with Riak and its Python drivers. I have 2 tests set up, one for Riak and another for MongoDB, both using their respectively provided Python drivers. I'm constructing chunks of JSON data consisting of a Person, who has an Address, and a purchase history which contains 1 to 20 line items with some data about the item name, cost, # puchased, etc. A very simple mockup of a purchase history. It does this for 1 million "people" (my initial goal was to see how lookups fared when you reach 1m+ records) When using MongoDB, the speed of inserts is incredibly fast. When using Riak, however, there is a very noticeable lag after each insert. So much so that when running side by side, the MongoDB test breaks into the 10,000s before Riak hits it's first 1k. My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this VM, I have Riak and MongoDB running concurrently. Here is a sample of how I'm using the Riak driver: riak_conn = RiakClient() bucket = riak_conn.bucket("peopledb") for i in range(1,100): try: new_obj = bucket.new("p" + str(i),MakePerson()) new_obj.store(return_body=False) except Exception as e: print e I'm wondering if there is something blatantly wrong I'm doing. I didn't see any kind of batch-store method o
Re: Question about Bitcask
No, the portion of Bitcask written in C only manages the in memory keydir structure. Reading and writing of files, coordination of merges, etc. is all done in Erlang. Thanks, Dan Daniel Reverri Developer Advocate Basho Technologies, Inc. d...@basho.com On Sun, Nov 28, 2010 at 3:20 AM, Kostya V wrote: > Is there any way to use it without Erlang, just with C ? > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com