Hi, On Dec 14, 2010, at 4:47 AM, Jake Eakle wrote:
> While attempting to write a YCSB[1] interface for Riak (for a school project, > so alternative benchmarking tools aren't solutions), I have encountered some > pretty severe performance issues with the official Java client. In > particular, an insert currently takes a minimum of 50 milliseconds, which I > have determined are spent in HttpClient.executeMethod() (10 msec) and > HttpMethod.getResponseBody() (40 msec). These are both calls into the Apache > Commons HttpLib library that the Riak client depends on. At these speeds, > even loading a YCSB workload into Riak is practically untenable -- by > comparison, we benchmarked Cassandra inserts at roughly .5 msec. > > I heard something in the irc channel about using protobuf instead, but the > section of the code that uses protobuf is utterly undocumented, and seems to > mostly be supporting the StreamClient, which, if the README is to be > believed, is only for streaming reads anyway. > > Any practical advice about how to get Riak up and running from Java at > workable speeds would be very much appreciated. Some hints from my forrays into benchmarking riak: With the HTTP protocol: - Feeding one item at a time to one node of the ring can be surprisingly slow. Use an HTTP load balancer like nginx and you'll get a speed improvement by at least an order of magnitude - Also, feed the data through more than one connection. Using HTTP and a loadbalancer I've seen nearly linear speed improvements up to 32 processes feeding data. I didn't test beyond 32 processes, so I don't know how things continue further up the parallelism ladder, but I believe things will keep growing at a comparable rate. - Adding more nodes to the ring (preferrably on separate physical machines, or at least separate spindles) will improve your writing speed further. Using the PB with Java: - Writes are possible with the java PB client, however performance seems erratic. Writing data in a single thread and no other traffic to the ring, I got between 50 and 140 inserts per second. I have no idea what causes the huge variance though. It may be in the protocol, it may be in the java client, it may have been the java GC running frequently for my generated data. In General: - Have a look at your replication factor and write consistency settings. For optimal writing speed you'll have to have a w-value of 1. Furthermore, if you have a higher replication factor than you have nodes ( or rather: storages) in the ring, at some point you'll be running into IO contention because several different nodes are fighting over the harddisk. For my benchmarking setup I had 6 nodes on 2 machines, each machine having a RAID-0 across 4 disks, replication factor 3, write consistency 1. Using HTTP with a loadbalancer and 32 writing processes I got over 500 writes per second. I'd assume running only one or two nodes per machine will give you another significant speed boost. Regards, Sven ------------------------------------------ Scoreloop AG, Brecherspitzstrasse 8, 81541 Munich, Germany, www.scoreloop.com sven.rie...@scoreloop.com Sitz der Gesellschaft: München, Registergericht: Amtsgericht München, HRB 174805 Vorstand: Dr. Marc Gumpinger (Vorsitzender), Dominik Westner, Christian van der Leeden, Vorsitzender des Aufsichtsrates: Olaf Jacobi _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com