Question about Bitcask

2010-11-28 Thread Kostya V
Is there any way to use it without Erlang, just with C ?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Python Driver Write Times

2010-11-28 Thread Derek Sanderson
Hello,

I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5)
as a datastore, and I've run into a performance issue that I'm unsure of the
true origin of, and would like some input from users who have been working
with Riak and its Python drivers.

I have 2 tests set up, one for Riak and another for MongoDB, both using
their respectively provided Python drivers. I'm constructing chunks of JSON
data consisting of a Person, who has an Address, and a purchase history
which contains 1 to 20 line items with some data about the item name, cost,
# puchased, etc. A very simple mockup of a purchase history. It does this
for 1 million "people" (my initial goal was to see how lookups fared when
you reach 1m+ records)

When using MongoDB, the speed of inserts is incredibly fast. When using
Riak, however, there is a very noticeable lag after each insert. So much so
that when running side by side, the MongoDB test breaks into the 10,000s
before Riak hits it's first 1k.

My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
VM, I have Riak and MongoDB running concurrently.

Here is a sample of how I'm using the Riak driver:

riak_conn = RiakClient()
bucket = riak_conn.bucket("peopledb")
for i in range(1,100):
try:
new_obj = bucket.new("p" + str(i),MakePerson())
new_obj.store(return_body=False)
except Exception as e:
print e

I'm wondering if there is something blatantly wrong I'm doing. I didn't see
any kind of batch-store method on the bucket (instead of calling store on
each object, simply persist the entirety of the bucket itself), and I wasn't
sure if this was an issue with my particular setup (maybe the specifics of
my VM are somehow throttling its performance), or maybe just a known
limitation that I wasn't aware of.

To shed some light on the disparity, I re factored my persistence into
separate methods, and used a wrapper to pull out the execution times. Here
is a very condensed list of run times. The method in question, for both
datastores, simply creates a new "Person" and stores it. Nothing else.

MakeRiakPerson took 40.139 ms
MakeRiakPerson took 40.472 ms
MakeRiakPerson took 40.651 ms
MakeRiakPerson took 51.630 ms
MakeRiakPerson took 36.733 ms

MakeMongoPerson took 1.810 ms
MakeMongoPerson took 3.619 ms
MakeMongoPerson took 1.036 ms
MakeMongoPerson took 1.275 ms
MakeMongoPerson took 3.656 ms

Thankyou in advance for any help that can be offered here. I'm incredibly
new to Riak as a whole, as well as very inexperienced when it comes to
working in a *nix environment, so I imagine there are countless ways I could
have shot myself in the foot without realizing it.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python Driver Write Times

2010-11-28 Thread Greg Steffensen
This is due to two factors:

1) Durability.  MongoDB stores writes in RAM and flushes them to disk
periodically (by default, every 60 seconds, according to this page:
http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means that
its writes can seem very, very fast, but if the machine goes down, you could
lose up to 60 seconds of data.  Riak writes don't return until the data has
actually been persisted to disk.  Casandra takes the same approach as
MongoDB, with the same trade-off.

2) Parallelism.  This test isn't taking advantage of Riak's distributed
nature.  Riak really shines when its run on a cluster of machines- you can
make your write throughput almost arbitrarily fast, as long as you're
willing add enough machines to the cluster.

I doubt that you'll be able to get single-node Riak to write as fast as
Mongo, but I'd guess that that numbers will get a little closer if you do
several writes simultaneously in both by multi-threading using python's
threading module.  Also, be sure that you're using Riak's protocol buffers
interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
believe the python client supports both.

Greg



On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote:

> Hello,
>
> I've recently started to explore using Riak (v0.13.0-2) from Python
> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
> unsure of the true origin of, and would like some input from users who have
> been working with Riak and its Python drivers.
>
> I have 2 tests set up, one for Riak and another for MongoDB, both using
> their respectively provided Python drivers. I'm constructing chunks of JSON
> data consisting of a Person, who has an Address, and a purchase history
> which contains 1 to 20 line items with some data about the item name, cost,
> # puchased, etc. A very simple mockup of a purchase history. It does this
> for 1 million "people" (my initial goal was to see how lookups fared when
> you reach 1m+ records)
>
> When using MongoDB, the speed of inserts is incredibly fast. When using
> Riak, however, there is a very noticeable lag after each insert. So much so
> that when running side by side, the MongoDB test breaks into the 10,000s
> before Riak hits it's first 1k.
>
> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
> VM, I have Riak and MongoDB running concurrently.
>
> Here is a sample of how I'm using the Riak driver:
>
> riak_conn = RiakClient()
> bucket = riak_conn.bucket("peopledb")
> for i in range(1,100):
> try:
> new_obj = bucket.new("p" + str(i),MakePerson())
> new_obj.store(return_body=False)
> except Exception as e:
> print e
>
> I'm wondering if there is something blatantly wrong I'm doing. I didn't see
> any kind of batch-store method on the bucket (instead of calling store on
> each object, simply persist the entirety of the bucket itself), and I wasn't
> sure if this was an issue with my particular setup (maybe the specifics of
> my VM are somehow throttling its performance), or maybe just a known
> limitation that I wasn't aware of.
>
> To shed some light on the disparity, I re factored my persistence into
> separate methods, and used a wrapper to pull out the execution times. Here
> is a very condensed list of run times. The method in question, for both
> datastores, simply creates a new "Person" and stores it. Nothing else.
>
> MakeRiakPerson took 40.139 ms
> MakeRiakPerson took 40.472 ms
> MakeRiakPerson took 40.651 ms
> MakeRiakPerson took 51.630 ms
> MakeRiakPerson took 36.733 ms
>
> MakeMongoPerson took 1.810 ms
> MakeMongoPerson took 3.619 ms
> MakeMongoPerson took 1.036 ms
> MakeMongoPerson took 1.275 ms
> MakeMongoPerson took 3.656 ms
>
> Thankyou in advance for any help that can be offered here. I'm incredibly
> new to Riak as a whole, as well as very inexperienced when it comes to
> working in a *nix environment, so I imagine there are countless ways I could
> have shot myself in the foot without realizing it.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python Driver Write Times

2010-11-28 Thread Jeremiah Peschka
To add to this, you won't see write speeds as fast as people have reported
in a variety of benchmarks because of I/O subsystem virtualization. You take
a 10-15% performance hit with virtualized disk when using a pure hypervisor
like VMWare ESX. Depending on your VM software, you could be taking a much
larger hit to your disk performance.

As Greg said, though, the big reasons for this difference is that MongoDB
caches writes in memory until the next file system sync interval and the
fact that Riak is designed to be run in a cluster of at least 3 machines
(preferably more).

Jeremiah Peschka
Microsoft SQL Server MVP
MCITP: Database Developer, DBA



On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen  wrote:

> This is due to two factors:
>
> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk
> periodically (by default, every 60 seconds, according to this page:
> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means
> that its writes can seem very, very fast, but if the machine goes down, you
> could lose up to 60 seconds of data.  Riak writes don't return until the
> data has actually been persisted to disk.  Casandra takes the same approach
> as MongoDB, with the same trade-off.
>
> 2) Parallelism.  This test isn't taking advantage of Riak's distributed
> nature.  Riak really shines when its run on a cluster of machines- you can
> make your write throughput almost arbitrarily fast, as long as you're
> willing add enough machines to the cluster.
>
> I doubt that you'll be able to get single-node Riak to write as fast as
> Mongo, but I'd guess that that numbers will get a little closer if you do
> several writes simultaneously in both by multi-threading using python's
> threading module.  Also, be sure that you're using Riak's protocol buffers
> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
> believe the python client supports both.
>
> Greg
>
>
>
> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote:
>
>> Hello,
>>
>> I've recently started to explore using Riak (v0.13.0-2) from Python
>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
>> unsure of the true origin of, and would like some input from users who have
>> been working with Riak and its Python drivers.
>>
>> I have 2 tests set up, one for Riak and another for MongoDB, both using
>> their respectively provided Python drivers. I'm constructing chunks of JSON
>> data consisting of a Person, who has an Address, and a purchase history
>> which contains 1 to 20 line items with some data about the item name, cost,
>> # puchased, etc. A very simple mockup of a purchase history. It does this
>> for 1 million "people" (my initial goal was to see how lookups fared when
>> you reach 1m+ records)
>>
>> When using MongoDB, the speed of inserts is incredibly fast. When using
>> Riak, however, there is a very noticeable lag after each insert. So much so
>> that when running side by side, the MongoDB test breaks into the 10,000s
>> before Riak hits it's first 1k.
>>
>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
>> VM, I have Riak and MongoDB running concurrently.
>>
>> Here is a sample of how I'm using the Riak driver:
>>
>> riak_conn = RiakClient()
>> bucket = riak_conn.bucket("peopledb")
>> for i in range(1,100):
>> try:
>> new_obj = bucket.new("p" + str(i),MakePerson())
>> new_obj.store(return_body=False)
>> except Exception as e:
>> print e
>>
>> I'm wondering if there is something blatantly wrong I'm doing. I didn't
>> see any kind of batch-store method on the bucket (instead of calling store
>> on each object, simply persist the entirety of the bucket itself), and I
>> wasn't sure if this was an issue with my particular setup (maybe the
>> specifics of my VM are somehow throttling its performance), or maybe just a
>> known limitation that I wasn't aware of.
>>
>> To shed some light on the disparity, I re factored my persistence into
>> separate methods, and used a wrapper to pull out the execution times. Here
>> is a very condensed list of run times. The method in question, for both
>> datastores, simply creates a new "Person" and stores it. Nothing else.
>>
>> MakeRiakPerson took 40.139 ms
>> MakeRiakPerson took 40.472 ms
>> MakeRiakPerson took 40.651 ms
>> MakeRiakPerson took 51.630 ms
>> MakeRiakPerson took 36.733 ms
>>
>> MakeMongoPerson took 1.810 ms
>> MakeMongoPerson took 3.619 ms
>> MakeMongoPerson took 1.036 ms
>> MakeMongoPerson took 1.275 ms
>> MakeMongoPerson took 3.656 ms
>>
>> Thankyou in advance for any help that can be offered here. I'm incredibly
>> new to Riak as a whole, as well as very inexperienced when it comes to
>> working in a *nix environment, so I imagine there are countless ways I could
>> have shot myself in the foot without realizing it.
>>
>> ___

Re: Python Driver Write Times

2010-11-28 Thread Alexander Sicular
Keep in mind that if you are using the standard riak install each of your 
writes are actually written three times to disk. The default configuration is 
an n_val of 3 for all buckets. For this reason I usually change the default 
config file or change the n_val of the specific bucket i'm using for testing.

-Alexander Sicular

@siculars

On Nov 28, 2010, at 12:20 PM, Jeremiah Peschka wrote:

> To add to this, you won't see write speeds as fast as people have reported in 
> a variety of benchmarks because of I/O subsystem virtualization. You take a 
> 10-15% performance hit with virtualized disk when using a pure hypervisor 
> like VMWare ESX. Depending on your VM software, you could be taking a much 
> larger hit to your disk performance.
> 
> As Greg said, though, the big reasons for this difference is that MongoDB 
> caches writes in memory until the next file system sync interval and the fact 
> that Riak is designed to be run in a cluster of at least 3 machines 
> (preferably more).
> 
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
> 
> 
> 
> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen  
> wrote:
> This is due to two factors:
> 
> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk 
> periodically (by default, every 60 seconds, according to this page: 
> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means that 
> its writes can seem very, very fast, but if the machine goes down, you could 
> lose up to 60 seconds of data.  Riak writes don't return until the data has 
> actually been persisted to disk.  Casandra takes the same approach as 
> MongoDB, with the same trade-off.  
> 
> 2) Parallelism.  This test isn't taking advantage of Riak's distributed 
> nature.  Riak really shines when its run on a cluster of machines- you can 
> make your write throughput almost arbitrarily fast, as long as you're willing 
> add enough machines to the cluster.  
> 
> I doubt that you'll be able to get single-node Riak to write as fast as 
> Mongo, but I'd guess that that numbers will get a little closer if you do 
> several writes simultaneously in both by multi-threading using python's 
> threading module.  Also, be sure that you're using Riak's protocol buffers 
> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I 
> believe the python client supports both.  
> 
> Greg
> 
> 
> 
> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson  wrote:
> Hello,
> 
> I've recently started to explore using Riak (v0.13.0-2) from Python (v2.6.5) 
> as a datastore, and I've run into a performance issue that I'm unsure of the 
> true origin of, and would like some input from users who have been working 
> with Riak and its Python drivers.
> 
> I have 2 tests set up, one for Riak and another for MongoDB, both using their 
> respectively provided Python drivers. I'm constructing chunks of JSON data 
> consisting of a Person, who has an Address, and a purchase history which 
> contains 1 to 20 line items with some data about the item name, cost, # 
> puchased, etc. A very simple mockup of a purchase history. It does this for 1 
> million "people" (my initial goal was to see how lookups fared when you reach 
> 1m+ records)
> 
> When using MongoDB, the speed of inserts is incredibly fast. When using Riak, 
> however, there is a very noticeable lag after each insert. So much so that 
> when running side by side, the MongoDB test breaks into the 10,000s before 
> Riak hits it's first 1k.
> 
> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm 
> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this 
> VM, I have Riak and MongoDB running concurrently.
> 
> Here is a sample of how I'm using the Riak driver:
> 
> riak_conn = RiakClient()
> bucket = riak_conn.bucket("peopledb")
> for i in range(1,100):
> try:
> new_obj = bucket.new("p" + str(i),MakePerson())
> new_obj.store(return_body=False)
> except Exception as e:
> print e
> 
> I'm wondering if there is something blatantly wrong I'm doing. I didn't see 
> any kind of batch-store method on the bucket (instead of calling store on 
> each object, simply persist the entirety of the bucket itself), and I wasn't 
> sure if this was an issue with my particular setup (maybe the specifics of my 
> VM are somehow throttling its performance), or maybe just a known limitation 
> that I wasn't aware of.
> 
> To shed some light on the disparity, I re factored my persistence into 
> separate methods, and used a wrapper to pull out the execution times. Here is 
> a very condensed list of run times. The method in question, for both 
> datastores, simply creates a new "Person" and stores it. Nothing else.
> 
> MakeRiakPerson took 40.139 ms
> MakeRiakPerson took 40.472 ms
> MakeRiakPerson took 40.651 ms
> MakeRiakPerson took 51.630 ms
> MakeRiakPerson took 36.733 ms
> 
> MakeMongoPerson took 1.810 ms
> Ma

Re: Python Driver Write Times

2010-11-28 Thread Daniel Lindsley
You're using the HTTP interface, which is slower than the Protocal
Buffers interface. You should change your code from:

riak_conn = RiakClient()
  bucket = riak_conn.bucket("peopledb")
  for i in range(1,100):
  try:
  new_obj = bucket.new("p" + str(i),MakePerson())
  new_obj.store(return_body=False)
  except Exception as e:
  print e

To:

from riak import RiakPbcTransport

riak_conn = RiakClient(port=8087, transport_class=riak.RiakPbcTransport)
  bucket = riak_conn.bucket("peopledb")
  for i in range(1,100):
  try:
  new_obj = bucket.new("p" + str(i),MakePerson())
  new_obj.store(return_body=False)
  except Exception as e:
  print e

In my testing, I saw a 200-300% speedup when switching to PB. As
previously suggested, you should also spin up two more Riak nodes
(making a cluster size of 3). See
https://wiki.basho.com/display/RIAK/Basic+Cluster+Setup, specifically
the "Add a Second Node to Your Cluster" section.


Daniel

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Python Driver Write Times

2010-11-28 Thread Derek Sanderson
I'm using the defaults for the python library, so that would be the HTTP
Rest interface. There is support for the PBC interface, which I'm looking
into using now.

I had suspected that since I wasn't really using Riak in such a way as to
let it shine (ie, in a cluster of nodes), that might be part of my problem.

Thanks so much for the detailed response.

On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen  wrote:

> This is due to two factors:
>
> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk
> periodically (by default, every 60 seconds, according to this page:
> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means
> that its writes can seem very, very fast, but if the machine goes down, you
> could lose up to 60 seconds of data.  Riak writes don't return until the
> data has actually been persisted to disk.  Casandra takes the same approach
> as MongoDB, with the same trade-off.
>
> 2) Parallelism.  This test isn't taking advantage of Riak's distributed
> nature.  Riak really shines when its run on a cluster of machines- you can
> make your write throughput almost arbitrarily fast, as long as you're
> willing add enough machines to the cluster.
>
> I doubt that you'll be able to get single-node Riak to write as fast as
> Mongo, but I'd guess that that numbers will get a little closer if you do
> several writes simultaneously in both by multi-threading using python's
> threading module.  Also, be sure that you're using Riak's protocol buffers
> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
> believe the python client supports both.
>
> Greg
>
>
>
> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote:
>
>> Hello,
>>
>> I've recently started to explore using Riak (v0.13.0-2) from Python
>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
>> unsure of the true origin of, and would like some input from users who have
>> been working with Riak and its Python drivers.
>>
>> I have 2 tests set up, one for Riak and another for MongoDB, both using
>> their respectively provided Python drivers. I'm constructing chunks of JSON
>> data consisting of a Person, who has an Address, and a purchase history
>> which contains 1 to 20 line items with some data about the item name, cost,
>> # puchased, etc. A very simple mockup of a purchase history. It does this
>> for 1 million "people" (my initial goal was to see how lookups fared when
>> you reach 1m+ records)
>>
>> When using MongoDB, the speed of inserts is incredibly fast. When using
>> Riak, however, there is a very noticeable lag after each insert. So much so
>> that when running side by side, the MongoDB test breaks into the 10,000s
>> before Riak hits it's first 1k.
>>
>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
>> VM, I have Riak and MongoDB running concurrently.
>>
>> Here is a sample of how I'm using the Riak driver:
>>
>> riak_conn = RiakClient()
>> bucket = riak_conn.bucket("peopledb")
>> for i in range(1,100):
>> try:
>> new_obj = bucket.new("p" + str(i),MakePerson())
>> new_obj.store(return_body=False)
>> except Exception as e:
>> print e
>>
>> I'm wondering if there is something blatantly wrong I'm doing. I didn't
>> see any kind of batch-store method on the bucket (instead of calling store
>> on each object, simply persist the entirety of the bucket itself), and I
>> wasn't sure if this was an issue with my particular setup (maybe the
>> specifics of my VM are somehow throttling its performance), or maybe just a
>> known limitation that I wasn't aware of.
>>
>> To shed some light on the disparity, I re factored my persistence into
>> separate methods, and used a wrapper to pull out the execution times. Here
>> is a very condensed list of run times. The method in question, for both
>> datastores, simply creates a new "Person" and stores it. Nothing else.
>>
>> MakeRiakPerson took 40.139 ms
>> MakeRiakPerson took 40.472 ms
>> MakeRiakPerson took 40.651 ms
>> MakeRiakPerson took 51.630 ms
>> MakeRiakPerson took 36.733 ms
>>
>> MakeMongoPerson took 1.810 ms
>> MakeMongoPerson took 3.619 ms
>> MakeMongoPerson took 1.036 ms
>> MakeMongoPerson took 1.275 ms
>> MakeMongoPerson took 3.656 ms
>>
>> Thankyou in advance for any help that can be offered here. I'm incredibly
>> new to Riak as a whole, as well as very inexperienced when it comes to
>> working in a *nix environment, so I imagine there are countless ways I could
>> have shot myself in the foot without realizing it.
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/l

Re: Python Driver Write Times

2010-11-28 Thread Derek Sanderson
About VM resources: I had suspected there would be a hit in this sense, but
I wasn't aware of any actual numbers. Thanks for that

About 3 writes / the need for 2 more nodes: I had no idea about that (x# of
writes per object). Riak is very unfamiliar territory for me. I'll read the
guide that has been suggested and look at running my tests under a more
"optimal" set up.

Thanks to everyone who responded to this issue. If I get the numbers
increased, I'll be sure to post a followup (in case anyone cares).

On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson  wrote:

> I'm using the defaults for the python library, so that would be the HTTP
> Rest interface. There is support for the PBC interface, which I'm looking
> into using now.
>
> I had suspected that since I wasn't really using Riak in such a way as to
> let it shine (ie, in a cluster of nodes), that might be part of my problem.
>
> Thanks so much for the detailed response.
>
> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen <
> greg.steffen...@gmail.com> wrote:
>
>> This is due to two factors:
>>
>> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk
>> periodically (by default, every 60 seconds, according to this page:
>> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means
>> that its writes can seem very, very fast, but if the machine goes down, you
>> could lose up to 60 seconds of data.  Riak writes don't return until the
>> data has actually been persisted to disk.  Casandra takes the same approach
>> as MongoDB, with the same trade-off.
>>
>> 2) Parallelism.  This test isn't taking advantage of Riak's distributed
>> nature.  Riak really shines when its run on a cluster of machines- you can
>> make your write throughput almost arbitrarily fast, as long as you're
>> willing add enough machines to the cluster.
>>
>> I doubt that you'll be able to get single-node Riak to write as fast as
>> Mongo, but I'd guess that that numbers will get a little closer if you do
>> several writes simultaneously in both by multi-threading using python's
>> threading module.  Also, be sure that you're using Riak's protocol buffers
>> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
>> believe the python client supports both.
>>
>> Greg
>>
>>
>>
>> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote:
>>
>>> Hello,
>>>
>>> I've recently started to explore using Riak (v0.13.0-2) from Python
>>> (v2.6.5) as a datastore, and I've run into a performance issue that I'm
>>> unsure of the true origin of, and would like some input from users who have
>>> been working with Riak and its Python drivers.
>>>
>>> I have 2 tests set up, one for Riak and another for MongoDB, both using
>>> their respectively provided Python drivers. I'm constructing chunks of JSON
>>> data consisting of a Person, who has an Address, and a purchase history
>>> which contains 1 to 20 line items with some data about the item name, cost,
>>> # puchased, etc. A very simple mockup of a purchase history. It does this
>>> for 1 million "people" (my initial goal was to see how lookups fared when
>>> you reach 1m+ records)
>>>
>>> When using MongoDB, the speed of inserts is incredibly fast. When using
>>> Riak, however, there is a very noticeable lag after each insert. So much so
>>> that when running side by side, the MongoDB test breaks into the 10,000s
>>> before Riak hits it's first 1k.
>>>
>>> My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
>>> running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
>>> VM, I have Riak and MongoDB running concurrently.
>>>
>>> Here is a sample of how I'm using the Riak driver:
>>>
>>> riak_conn = RiakClient()
>>> bucket = riak_conn.bucket("peopledb")
>>> for i in range(1,100):
>>> try:
>>> new_obj = bucket.new("p" + str(i),MakePerson())
>>> new_obj.store(return_body=False)
>>> except Exception as e:
>>> print e
>>>
>>> I'm wondering if there is something blatantly wrong I'm doing. I didn't
>>> see any kind of batch-store method on the bucket (instead of calling store
>>> on each object, simply persist the entirety of the bucket itself), and I
>>> wasn't sure if this was an issue with my particular setup (maybe the
>>> specifics of my VM are somehow throttling its performance), or maybe just a
>>> known limitation that I wasn't aware of.
>>>
>>> To shed some light on the disparity, I re factored my persistence into
>>> separate methods, and used a wrapper to pull out the execution times. Here
>>> is a very condensed list of run times. The method in question, for both
>>> datastores, simply creates a new "Person" and stores it. Nothing else.
>>>
>>> MakeRiakPerson took 40.139 ms
>>> MakeRiakPerson took 40.472 ms
>>> MakeRiakPerson took 40.651 ms
>>> MakeRiakPerson took 51.630 ms
>>> MakeRiakPerson took 36.733 ms
>>>
>>> MakeMongoPerson took 1.810 ms
>>> MakeMongoPerson took 3.619 ms
>>> MakeMongoPe

Re: Python Driver Write Times

2010-11-28 Thread Derek Sanderson
>From just the switch to the PBC Transport, I see this kind of increase:

REST HTTP:
MakeRiakPerson took 40.139 ms
MakeRiakPerson took 40.472 ms
MakeRiakPerson took 40.651 ms
MakeRiakPerson took 51.630 ms
MakeRiakPerson took 36.733 ms

ProtoBuffers:
MakeRiakPerson took 1.989 ms
MakeRiakPerson took 2.650 ms
MakeRiakPerson took 1.523 ms
MakeRiakPerson took 3.707 ms
MakeRiakPerson took 7.213 ms
MakeRiakPerson took 3.141 ms

That almost pushes it down in and of itself to the speeds I got with
MongoDB. My next step will be moving to replicate an environment where I
have atleast 3 nodes up and running.

Thanks for the assistance, everyone.

Again, thanks to everyone who lent a hand. You've made a Riak-Believer out
of me.

On Sun, Nov 28, 2010 at 3:12 PM, Derek Sanderson  wrote:

> About VM resources: I had suspected there would be a hit in this sense, but
> I wasn't aware of any actual numbers. Thanks for that
>
> About 3 writes / the need for 2 more nodes: I had no idea about that (x# of
> writes per object). Riak is very unfamiliar territory for me. I'll read the
> guide that has been suggested and look at running my tests under a more
> "optimal" set up.
>
> Thanks to everyone who responded to this issue. If I get the numbers
> increased, I'll be sure to post a followup (in case anyone cares).
>
>
> On Sun, Nov 28, 2010 at 3:07 PM, Derek Sanderson wrote:
>
>> I'm using the defaults for the python library, so that would be the HTTP
>> Rest interface. There is support for the PBC interface, which I'm looking
>> into using now.
>>
>> I had suspected that since I wasn't really using Riak in such a way as to
>> let it shine (ie, in a cluster of nodes), that might be part of my problem.
>>
>> Thanks so much for the detailed response.
>>
>> On Sun, Nov 28, 2010 at 12:10 PM, Greg Steffensen <
>> greg.steffen...@gmail.com> wrote:
>>
>>> This is due to two factors:
>>>
>>> 1) Durability.  MongoDB stores writes in RAM and flushes them to disk
>>> periodically (by default, every 60 seconds, according to this page:
>>> http://www.mongodb.org/display/DOCS/Durability+and+Repair).  This means
>>> that its writes can seem very, very fast, but if the machine goes down, you
>>> could lose up to 60 seconds of data.  Riak writes don't return until the
>>> data has actually been persisted to disk.  Casandra takes the same approach
>>> as MongoDB, with the same trade-off.
>>>
>>> 2) Parallelism.  This test isn't taking advantage of Riak's distributed
>>> nature.  Riak really shines when its run on a cluster of machines- you can
>>> make your write throughput almost arbitrarily fast, as long as you're
>>> willing add enough machines to the cluster.
>>>
>>> I doubt that you'll be able to get single-node Riak to write as fast as
>>> Mongo, but I'd guess that that numbers will get a little closer if you do
>>> several writes simultaneously in both by multi-threading using python's
>>> threading module.  Also, be sure that you're using Riak's protocol buffers
>>> interface, instead of the REST (HTTP) one, which adds a lot of overhead- I
>>> believe the python client supports both.
>>>
>>> Greg
>>>
>>>
>>>
>>> On Sun, Nov 28, 2010 at 11:48 AM, Derek Sanderson wrote:
>>>
 Hello,

 I've recently started to explore using Riak (v0.13.0-2) from Python
 (v2.6.5) as a datastore, and I've run into a performance issue that I'm
 unsure of the true origin of, and would like some input from users who have
 been working with Riak and its Python drivers.

 I have 2 tests set up, one for Riak and another for MongoDB, both using
 their respectively provided Python drivers. I'm constructing chunks of JSON
 data consisting of a Person, who has an Address, and a purchase history
 which contains 1 to 20 line items with some data about the item name, cost,
 # puchased, etc. A very simple mockup of a purchase history. It does this
 for 1 million "people" (my initial goal was to see how lookups fared when
 you reach 1m+ records)

 When using MongoDB, the speed of inserts is incredibly fast. When using
 Riak, however, there is a very noticeable lag after each insert. So much so
 that when running side by side, the MongoDB test breaks into the 10,000s
 before Riak hits it's first 1k.

 My main PC is a Windows7 i7 quad core, with 8 gigs of ram, on which I'm
 running Ubuntu64 v10.04 on a VM, which has 2GB of memory allotted. On this
 VM, I have Riak and MongoDB running concurrently.

 Here is a sample of how I'm using the Riak driver:

 riak_conn = RiakClient()
 bucket = riak_conn.bucket("peopledb")
 for i in range(1,100):
 try:
 new_obj = bucket.new("p" + str(i),MakePerson())
 new_obj.store(return_body=False)
 except Exception as e:
 print e

 I'm wondering if there is something blatantly wrong I'm doing. I didn't
 see any kind of batch-store method o

Re: Question about Bitcask

2010-11-28 Thread Dan Reverri
No, the portion of Bitcask written in C only manages the in memory keydir
structure. Reading and writing of files, coordination of merges, etc. is all
done in Erlang.

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
d...@basho.com


On Sun, Nov 28, 2010 at 3:20 AM, Kostya V  wrote:

> Is there any way to use it without Erlang, just with C ?
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com