Thanks for the feedback. I made two changes to my test setup and saw better throughput:
1) Don't write to the same key over and over. Updating a key appears to be a lot slower than creating a new key 2) I used parallel PUTs The throughput I was measuring before was about 26MB/s on localhost. With these changes it went to around 200MB/s on a disk that can write at about 480MB/s. That is more the type of performance I need for the data store we have in mind. I am going to proceed with testing on 8 nodes with RAID0 drives. Here are some details of the testing I did if it will help others. I tried the test with 1MB, 10MB, and 20MB binary data. I didn't notice a big signal with regard to larger objects slowing things down. wget http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm sudo rpm -Uvh riak-1.2.1-1.el5.x86_64.rpm /usr/sbin/riak start mkdir data-dir && cd data-dir seq -w 0 100 | parallel dd if=/dev/zero of={}.10meg bs=8k count=1280 http_proxy= # don’t contact proxy time find . -name \*.10meg | parallel -j8 -n1 wget --post-file {} http://127.0.0.1:8098/riak/test1/{} During these tests I saw beam.smp jumping to 350-550 while watching %CPU under top. When I was seeing slower thoughput beam.smp was using much less CPU. Kind regards, -Matt On Wed, Apr 3, 2013 at 7:20 AM, Reid Draper <reiddra...@gmail.com> wrote: > inline: > > > On Apr 2, 2013, at 6:48 PM, Matthew MacClary < > maccl...@lifetime.oregonstate.edu> wrote: > > Hi all, I am new to this list. Thanks for taking the time to read my > questions! I just want to know if the data throughput I am seeing is > expected for the bitcask backend or if it is too low. > > I am doing the preliminary feasibility study to decide if we should > implement a Riak data store. Our application involves rendering chunks of > data that range in size from about 1MB-9MB or so. This rendering work is > CPU intensive so it is spread over a bunch of compute nodes which write the > output into a data store. > > > Riak is not intended to store objects of this size, not at the moment > anyway. Riak CS [1], on the other hand, can store files up to several TB. > That being said, Riak CS may or may not have other qualities you desire. > It's a known issue [2] that the Riak object size limitations should be > better documented. > > > After rendering, a second process consumes that data chunks from the data > store at a rate of about 480MB/s in a streaming configuration so there is > > 480MB/s of new data coming in at the same time the data is being read. > > > Is this a single-socket, or is there some concurrency here? > > > My testing so far involves a one node cluster on a dev box. What I wanted > to show is that Riak writes were limited by the hard disk throughput. So > far I haven't seen writes to localhost come anywhere close to the hard disk > throughput: > > $ MYFILE=/tmp/output.png > $ dd if=/dev/zero of=$MYFILE bs=8k count=256k > 262144+0 records in > 262144+0 records out > 2147483648 bytes (2.1 GB) copied, 4.48906 seconds, 478 MB/s > $ rm $MYFILE > > So the hard disk throughput is around 478MB/s for this simple write test. > > The next test I did was to load a 39MB binary file into my one node > cluster. I used a script to do 12 POSTs with curl and 12 POSTSs with wget. > > curl --tcp-nodelay -XPOST http://${IP}:${PORT}/riak/test/file3 \ > -H "Content-Type:application/octet-stream" \ > --data-binary @${UPLOAD_FILE} \ > --write-out "%{speed_upload}\n" > > wget --post-file ${UPLOAD_FILE} http://127.0.0.1:8098/riak/test/file1 > > What I found was that I could get only about 26MB/s with this command line > testing. Does this seam about right? Should I see an 18x slow down over the > write speed of the disk drive? > > > Was this running the 24 (12 * 2) uploads in serial or parallel? With a > single-threaded workload, you're unlikely to get Riak to be able to > saturate a disk. Furthermore, there are design decisions in Riak at the > moment that make it less than optimal for single objects of 39MB. > Single-object high throughput (measured in MB) is more in the wheelhouse of > Riak CS than Riak on it's own, which is primarily designed for low-latency > and high-throughput (measured in ops/sec). One of the ways that Riak CS > achieves this on top of Riak is by introducing concurrency between the > end-user and Riak. > > > Thanks for your comments on my application and test approach! > > > Hope this helps, > Reid > > [1] http://docs.basho.com/riakcs/latest/ > [2] https://github.com/basho/basho_docs/issues/256 > > > > -Matt > > ----------------------------------------------- > Dev Environment Details: > dev box running RHEL6.2, 12 cores, 48GB, 6Gb/s SAS 15k HD > Riak 1.2.1 from > http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/1.2/1.2.1/rhel/5/riak-1.2.1-1.el5.x86_64.rpm > n_val=1 > r=1 > w=1 > backend=bitcask > > Deploy Environment Details: > Node to node bandwidth > 40Gb/s > similar config for node servers > n_val=3 > r=1 > w=1 > backend=? > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com