Re: speeding up riaksearch precommit indexing

Rusty Klophaus Thu, 09 Jun 2011 09:56:29 -0700

Hi Steve,

Riak does best with a lot of memory and a fast disk. Depending on how much
data you have in the system, putting two nodes into 1GB of memory on a
single VM may be causing the system to overrun available resources and page
out to disk, and depending on how you've set up your virtualized
environment, you could be paying extra costs with each disk access,
compounding the problem. My first recommendation would be to either run the
test again while monitoring disk operations using iostat to see if disk is
the problem, or to just go ahead and test on bigger hardware. I suspect you
will see much less of a performance difference between the tests once there
are ample resources.

That said, some slowdown is expected when you turn on indexing, as Riak
Search adds quite a bit of overhead in parsing and tokenizing the document,
and then storing the results.

There are two ways to speed up indexing:

   1. Reduce the size of your documents. If your documents are large, but
   you only need one or two fields indexed, you can create smaller "surrogate"
   documents with just the fields you need indexed, plus a link back to your
   original document.
   2. Batch your writes using the Solr interface. Riak Search uses
   "term-based partitioning". Term-based partitioning reduces complexity during
   queries, at the cost of increased complexity during writes.  You can gain
   back some of the lost performance by batching your writes. This allows the
   system to plan which messages it sends more intelligently, thus sending
   fewer messages and reducing overhead. The downside here is that you can't
   use the Riak KV interface, you need to switch to the Solr interface.

Would you mind describing a bit more about your the size and shape of your
data (how many objects, average object size, object format, throughput,
etc.) and ideally attach your Riak Search schema?

Thanks,
Rusty

On Tue, Jun 7, 2011 at 4:35 PM, Steve Webb <sw...@gnip.com> wrote:

> Hey there.
>
> I'm inserting twitter spritzer tweets into a bucket that doesn't have a
> precommit index hook, and a few fields from the tweet into a second bucket
> that does have the precommit hook.
>
> Speeds on the inserts into the indexed bucket are an order or magnitude
> slower than the non-indexed bucket.
>
> I'm using a 1GB ram, 20GB disk vmware VM, 2-node cluster, ubuntu 10.4,
> riaksearch 0.14.0 with n_val = 2.
>
> Is there a way to do a more lazy indexing to where it doesn't slow down
> inserts so much?
>
> - Steve
>
> --
> Steve Webb - Senior System Administrator for gnip.com
> http://twitter.com/GnipWebb
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: speeding up riaksearch precommit indexing

Reply via email to