Jan - I am hacking on it a bit to more closely match your use case. As soon as I have it done I will send it and the test generation script I'm using to populate test data.
--Kevin On Nov 9, 2010, at 10:35 AM, Jan Buchholdt wrote: > Kevin - > > The test client is part of a bigger system and would be a bit too much top > send to you. The method that is calling Riak looks like this: > > import com.basho.riak.client.*; > . > . > public List<Document> lookupDocuments(String personId, String url) { > RiakClient riak = new RiakClient(url); > > WalkResponse walkResponse = riak.walk("person", personId, > "document,_,_"); > if (walkResponse.isSuccess()) { > List<Document> out = new ArrayList<Document>(); > List<? extends List<RiakObject>> steps = walkResponse.getSteps(); > if (steps.size() != 1) { > throw new RuntimeException("Expected to walk one link. Walked > " + steps.size()); > } > List<RiakObject> step = steps.get(0); > for (RiakObject o : step) { > try { > String chars = o.getValue(); > Builder builder = Protos.Document.newBuilder(); > JsonFormat2.merge(chars, builder); > out.add(((Document) builder.build()).getDocument()); > } catch (ParseException e) { > throw new DocumentServiceException("Error parsing > document", e); > } > } > return out; > } else { > throw new RuntimeException("Walk error: " + > walkResponse.getHttpHeaders()); > } > } > > It could be interesting to repeat your test on our cluster, to see if we get > the same numbers as you do. Is it possible for you to send the code behind > your test? > > -- > Jan Buchholdt > Software Pilot > Trifork A/S > Cell +45 50761121 > > > > On 2010-11-09 15:47, Karsten Thygesen wrote: >> On Nov 9, 2010, at 14:58 , Kevin Smith wrote: >> >>> On Nov 9, 2010, at 5:01 AM, Karsten Thygesen wrote: >>> >>>> Hi >>>> >>>> OK, we will use a larger ringsize next time and will consider a data >>>> reload. >>>> >>>> Regarding the metrics: the servers are dedicated to Riak use and it not >>>> used for anything else. They are new HP servers with 8 cores each and >>>> 4x146GB 10K RPM SAS disks in a contatenated mirror setup. We use Solaris >>>> with ZFS as filesystem and I have turned off atime update in the data >>>> partition. >>>> >>>> The pool is built as such: >>>> >>>> pool: pool01 >>>> state: ONLINE >>>> scrub: scrub completed after 0h0m with 0 errors on Tue Oct 26 21:25:05 2010 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> pool01 ONLINE 0 0 0 >>>> mirror-0 ONLINE 0 0 0 >>>> c0t0d0s7 ONLINE 0 0 0 >>>> c0t1d0s7 ONLINE 0 0 0 >>>> mirror-1 ONLINE 0 0 0 >>>> c0t2d0 ONLINE 0 0 0 >>>> c0t3d0 ONLINE 0 0 0 >>>> >>>> errors: No known data errors >>>> >>>> so it is as fast as possible. >>>> >>>> However - we use the ZFS default blocksize, which is 128Kb - is that >>>> optimal with bitcask as backend? It is rather large, but what is optimal >>>> with bitcask? >>> I don't have much experience tuning Solaris or ZFS for Riak. This is a >>> question best asked of Ryan and I will make sure he sees this. >> Thanks! >> >>>> The cluster is 4 servers with gigabit connection located in the same >>>> datacenter on the same switch. The loadbalancer is a Zeus ZTM, which does >>>> quote a few http optimizations including extended reuse of http >>>> connections and we usually see far better response times using the >>>> loadbalancer than using a node directly. >>> Hmmm. Can you share what the performance times are like for direct cluster >>> access? >> In this case, there is no measurable difference whenever we ask a cluster >> node directly or we go through the loadbalancer. The largest difference is >> when we hit it with a lot of small requests, but that is not the case here. >> >>>> When we run the test, each riak node is only about 100% cpu loaded (which >>>> on solaris means, that it only uses one of the 8 cores). We have seen >>>> spikes in the 160% area, but everything below 800% is not cpu bound. So >>>> all-in-all, the cpuload is between 5 and 10%. >>> Can you send me the code you're using for the performance test? I'd like to >>> run the exact code on my test hardware and see if that reveals anything. >> Jan, can you please provide the test client? >> >>> Also, low CPU usage might indicate you are IO bound. Do you know if Riak >>> processes are spending much time waiting for IO to complete? >>> >> It does not seem so. The servers are not IO bound, there is plenty of >> network capacity and the disks is only around 10% loaded. >> >> My largest suspicion is on the datamodel - when having a 4-node cluster and >> doing a linkwalk, which need to combine around 5-600 documents, it will take >> quite some time, but we still feel, that the numbers is very high. >> >> Perhaps we should consider a datamodel, where we collect, say, 100 documents >> in a basket and the only have to linkwalk 4-5 baskets to return an answer? >> Tempting, performancewise, but it makes it a lot harder to maintain the data >> afterwards as we can not just use map-reduce and similar technologies to >> handle data... >> >> Karsten >> >>> --Kevin >>> >>> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com