On Fri, Jun 25, 2010 at 8:25 PM, Ryan Tilder <[email protected]> wrote:
> Hi, Dmitry.  There are some gaps in the information you included here that
> might help clarify what's going on so I'm going to just rattle off some
> questions for clarification.
> Is your test driver only making requests of a single EC2 instance?  Or are
> you querying all 7 nodes directly in so sort of load distribution?   If you
> aren't querying all 7 nodes directly, then you will likely see performance
> on par with a cluster with only a single "physical" node.

I tried both ways: querying only one node and querying all the nodes.
The results were approximately the same. But as far as I understand,
for map-reduce queries it's an expected result, isn't it?

> Are you certain that the 7 nodes are communicating with each other?  The
> output of the "riak-admin status" command should list the nodes in the
> "ring_members" field.

Yes, sure.

> Are the "documents" a separate key with Riak's built in links to the
> "entities" or are they keys with a data blob that refer to the entities?[1]
>  If the latter, have you
> read http://blog.basho.com/2010/02/24/link-walking-by-example/ ?

To simplify, document data (I mean, value in Riak database) had
structure like this:

[{entities, [123, 456, 745, 2352, 235 | ...]}].

I actually used timestamps in microseconds for Ids but that doesn't
really matter.
And, regarding your last question, documents and entities were stored
in different buckets.

What about links? Should they give better speed in that case? Also,
neither Erlang native API (I mean riak_client module) nor Erlang PBC
seem to have link-walking functions like REST API.


> It's also important for me to note that EC2 instances do not necessarily
> have the same characteristics of actual physical hardware when it comes to
> preventing resource contention.  Since EC2 instances are virtualized, you
> have no idea what other load the physical host of a given instance may be
> under.  As a result it is possible to have a Riak instance running on the
> same hardware as another IO and CPU intensive instance without your
> knowledge, impeding each other to a certain degree.  We've had a number of
> users complain of performance problems with Riak clusters running on EC2 at
> various times.  From my personal and anecdotal experience, EC2 seems to be
> pretty heavily oversubscribed much of the time which leads to intermittent
> performance issues for all kinds of applications.
> All of that is just a long winded way of saying: don't expect shared
> virtualized resources to provide the same performance as dedicated physical
> hardware.  But you should still see at least somewhat better performance
> that you're seeing now if your testing harness is testing properly.

Sure, I understand that. But I expected at least a bit better performance.

Anyway, the day before yesterday I ran some tests using basho_bench.
These tests cheered me up a bit :)
Here's the link to the results:

http://demmonoid.livejournal.com/4098.html

Please let me know if you want me to add or correct any links to your
resources or add any more information about the tests.

> --Ryan
> 1. I'm not certain if you're saying that the documents are stored in a
> separate bucket from the entities in the same Riak cluster or a separate
> Riak cluster entirely.
> On Fri, Jun 25, 2010 at 12:02 AM, Dmitry Demeshchuk <[email protected]>
> wrote:
>>
>> Greetings.
>>
>> I tried running Riak with bitcask backend on 7 Amazon EC2 standard
>> large instances (7.5 GB RAM, 4 EC2 CPU units) and performed some
>> tests.
>> For comparison, I built up the following Riak clusters:
>>
>> 7 physical nodes ring
>> 1 physical node ring (on one of the 7 instances, but I ran the tests
>> separately so the rings won't mess with each other)
>> 1 physical node ring on an extra large instance (15 GB RAM, 8EC2 CPU
>> units)
>>
>> and ran a couple of tests with putting and getting data using Riak
>> native Erlang API (not PBC).
>>
>> I had 2 buckets, the first one having small (averagely about 1KB)
>> values, but a lot of them (about several millions) called "entities",
>> and the second one having lists of keys from the first database,
>> called "documents". So, every document consists of a lot of entities
>> (I used 100 and 1000 for my tests). So, the approximate size of every
>> document was either 100KB or 1MB.
>>
>> So, I performed tests of putting documents and entities to database
>> and then obtaining them. I tried to perform reads and writes using 10
>> and 100 concurrent Erlang processes (well, 100 was generally too much
>> as I ran out of CPU), first from only one machine and then from 2 and
>> 3 machines at the same time (for the 7-nodes ring). Of course, the
>> entities were obtained using map-reduce.
>>
>> The first weird thing was that even with 10 concurrent reads and
>> writes the performance didn't differ for all three clusters. Okay, 1
>> large and 1 extra large nodes don't differ so much but the 7 nodes
>> should have given me some performance, shouldn't they?
>>
>> The second thing was that the average read time for one document with
>> 1000 entities was about 5 seconds, and again, the number of machines
>> in the cluster didn't affect the result. I guess I just stumbled upon
>> the performance of the instance that sent all the map-reduce requests
>> and then collected the replies because when I ran tests on the other 2
>> instances, all three had the same performance.
>>
>> The other strange thing was that during data writes most of the time
>> nodes were not io-loaded. If it was a one-stream write, it would be
>> obvious. But it were 10 and then 20 and 30 simultaneous writing
>> processes!
>>
>>
>> Unfortunately I cannot provide the detailed results now, they are
>> pretty messed up. I'm going to use basho_bench to make good graphs and
>> tables of these tests.
>>
>> Any advises for the future tests or any explanations for such strange
>> performance?
>>
>> Thank you in advance and sorry for a little messed up e-mail.
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>



-- 
Best regards,
Dmitry Demeshchuk

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to