Hi Steve, Riak does best with a lot of memory and a fast disk. Depending on how much data you have in the system, putting two nodes into 1GB of memory on a single VM may be causing the system to overrun available resources and page out to disk, and depending on how you've set up your virtualized environment, you could be paying extra costs with each disk access, compounding the problem. My first recommendation would be to either run the test again while monitoring disk operations using iostat to see if disk is the problem, or to just go ahead and test on bigger hardware. I suspect you will see much less of a performance difference between the tests once there are ample resources.
That said, some slowdown is expected when you turn on indexing, as Riak Search adds quite a bit of overhead in parsing and tokenizing the document, and then storing the results. There are two ways to speed up indexing: 1. Reduce the size of your documents. If your documents are large, but you only need one or two fields indexed, you can create smaller "surrogate" documents with just the fields you need indexed, plus a link back to your original document. 2. Batch your writes using the Solr interface. Riak Search uses "term-based partitioning". Term-based partitioning reduces complexity during queries, at the cost of increased complexity during writes. You can gain back some of the lost performance by batching your writes. This allows the system to plan which messages it sends more intelligently, thus sending fewer messages and reducing overhead. The downside here is that you can't use the Riak KV interface, you need to switch to the Solr interface. Would you mind describing a bit more about your the size and shape of your data (how many objects, average object size, object format, throughput, etc.) and ideally attach your Riak Search schema? Thanks, Rusty On Tue, Jun 7, 2011 at 4:35 PM, Steve Webb <sw...@gnip.com> wrote: > Hey there. > > I'm inserting twitter spritzer tweets into a bucket that doesn't have a > precommit index hook, and a few fields from the tweet into a second bucket > that does have the precommit hook. > > Speeds on the inserts into the indexed bucket are an order or magnitude > slower than the non-indexed bucket. > > I'm using a 1GB ram, 20GB disk vmware VM, 2-node cluster, ubuntu 10.4, > riaksearch 0.14.0 with n_val = 2. > > Is there a way to do a more lazy indexing to where it doesn't slow down > inserts so much? > > - Steve > > -- > Steve Webb - Senior System Administrator for gnip.com > http://twitter.com/GnipWebb > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com