You can also add in that Riak is, by default, going to pull back the freshly 
written copy of the data from disk. You can disable this behavior by setting 
ReturnBody to false. Disabling this is great for bulk data loads.

Consumer hard drives just plain suck for performance. You're going to be 
limited by tiny caches, slow rotational speeds, and (unless you have newish 
gear) a slower I/O backplane. Since you probably can't change your hardware 
quickly, look at other things.

My guess is that your read/write pattern is skewing your performance 
tremendously. Hints for performance: 
If you must read where you write, read memory. If it won't fit into memory, 
buffer it in chunks. 
Multi-thread for performance. 
Avoid HTTP like the plague (use protocol buffers instead). 
Use the fastest gear money can buy - Riak is going to be heavily disk I/O bound 
so tune your storage appropriately. If you can't afford SSD, try 15k SAS 
drives. If you can't afford or use 15k SAS drives, use a buttload of 10k drives 
in RAID 10 with a hardware controller. If you can't use hardware RAID, don't 
RAID.

---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Nov 20, 2011, at 1:59 PM, Aphyr wrote:

> On 11/20/2011 01:34 PM, Catalin Constantin wrote:
>> To make it simple. No more networking. Just one node (with n = 1) and
>> local tests.
>> 
>> The producing of data is a simple CSV file read (ruled out too cause
>> this is fast).
> 
> Read from the same disk? If you're interleaving every write with a read from 
> this file, how many back-forth seeks do you think your disk is doing?
> 
>> HDD: 2 x 750 GB SATA 2 (RAID1)
> 
> Hint hint hint.
> 
>> What insert rate should i expect on a normal Debian 6.0 64 bit
>> installation (no tweaks) ?
> 
> 450 inserts/second. Or, if you address some of the points I mentioned 
> earlier, perhaps 2000-4000/sec, depending on write characteristics. Most 
> people find performance improves linearly with nodes, so long as the network 
> is not the bottleneck.
> 
> Our six-node cluster (bitcask-dedicated SSDs, 2x bonded gige, 2:1 read:write 
> ratio, median value ~10 kB, n_val 3, typical r/w: quorum) tops out at about 
> 3,000 aggregate ops/sec while maintaining reasonable (~10ms 99%) latencies. I 
> can push it higher if I relax latency constraints.
> 
>> I can only compare it with other DBs i have tested on the same machine:
>> ex: mongodb, kyototycoon
> 
> These databases solve different problems in different ways. You should expect 
> them to perform differently. The question is: for your workload, what balance 
> of raw IOPS, redundancy, availability, latency, and conflict handling model 
> fits best? Riak trades IOPS for availability and redundancy, and trades 
> MVCC/locking for vclock resolution.
> 
> --Kyle
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to