Fwd: Evaluating Riak for search

Todd Tyree Mon, 16 Dec 2013 02:45:51 -0800

Forgot to Reply-All.

---------- Forwarded message ----------
From: Todd Tyree <t...@basho.com>
Date: Mon, Dec 16, 2013 at 10:43 AM
Subject: Re: Evaluating Riak for search
To: Cristian Bichis <cri...@imagis.ro>

Hi Cristian,

Firstly, I recommend you look at our Yokozuna project [0].  This is a tight
integration of Riak and SOLR that will be released as Riak Search 2.0.

To answer your specific questions:

> what capacity/server specs could I start ?

Have a look at our cluster capacity planning guide for a start [1].  For a
production environment, you will need a minimum of five nodes for optimal
performance.  Our key-value read and write operations scale linearly, so
you can easily increase performance by adding nodes.

Yokozuna search works using a coverage query, meaning that a quorum of
nodes (ceil(number of nodes/2)) must respond for a query to be considered
successful.  Practically, this means that it does not scale linearly.  I
believe a standard five-node cluster will meet your search performance
requirements, but you should do some benchmarking to ensure this is the
case.  Currently, Basho Bench does not support Yokozuna queries, so you
will need to use another tool to perform these benchmarks.

However, Yokozuna is compatible with all SOLR clients, so you should be
able to use any SOLR benchmarking tool to test and optimise search
performance.

Minimum per-node hardware recommendations can be found in our "Planning for
a Riak System" guide [2].

> Can I start with just one box (the app will grow so definetly I will
benefit from scaling features of Riak at later moment) for the above specs
(200+ search queries, 5-20 write queries, 1 Million bucket size) ?

No, I'm afraid not.  At a minimum you will need a five-node cluster.  In
addition to the information from docs.basho.com, you should also read our
blog entry "Why Your Riak Cluster Should Have At Least Five Nodes" [3].

I recommend you read and apply all of the recommended settings found in the
'Tuning' section of our documentation [4][5][6].

> what response times should I expect for search requests ? How about the
write requests ? I don't have lined up here the queries and the data so
this is impossible to know but I have no idea now how Riak works on
performance. As you see for the moment scaling is not my focus. I plan to
use the bench tool to do some testing but some overall insights still are
helping.

This depends on your data and access patterns.  As I said earlier, you
should benchmark representative data.

> high offset (eg: list 10 items from a search located at offset 200,000)
search requests how are expected to work ?

You should be able to use the standard SOLR 'start' and 'rows' query
parameters [7] to setup the offset and the number of results returned.
 Faceting may also be appropriate depending on your data [8].  You may need
to optimise your SOLR queries for efficiency for these kinds of operations.
 Without knowing more about your data, I'm afraid I cannot recommend a
specific strategy.

Please be aware that Yokozuna shares the same limitations as SOLR's
Distributed Search [9].

For the broad scenario you are describing, I suggest you use the bitcask
backend and start with a ring size of 128 or 256.  You can do an easy
proof-of-concept on AWS EC2 instances.  Specific AWS tuning recommendations
can be found on the 'AWS Performance Tuning' page [10].   We frequently use
m1.xlarge instances for exactly this purpose.

[0] https://github.com/basho/yokozuna
[1] http://docs.basho.com/riak/latest/ops/building/planning/cluster/
[2]
http://docs.basho.com/riak/latest/ops/building/planning/system-planning/#Hardware
[3] http://basho.com/why-your-riak-cluster-should-have-at-least-five-nodes/
[4] http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/
[5] http://docs.basho.com/riak/latest/ops/tuning/file-system/
[6] http://docs.basho.com/riak/latest/ops/tuning/linux/
[7] http://wiki.apache.org/solr/CommonQueryParameters
[8] http://wiki.apache.org/solr/SolrFacetingOverview
[9]
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
[10] http://docs.basho.com/riak/latest/ops/tuning/aws/

On Sun, Dec 15, 2013 at 7:56 AM, Cristian Bichis <cri...@imagis.ro> wrote:

>  Hello,
>
> I have an application which is based on mysql + sphinx (for search) + PHP
> + caching. I am running the mysql part on a multi server master-slave and
> currently sphinx is over one box only.
>
> Currently I have performance issues with writes
> (INSERT/DELETE/UPDATE/REPLACE) on Sphinx and I am looking for an
> alternative on search part. I was checking by 1+ year for some alternative
> solutions (when I didn't had current issues with sphinx) and on the short
> list is Riak and Mark Logic.
>
> The app I am currently handing has about 130-170 search queries (SELECT
> with full text) at peak time (but occasionally can go to 200-500 qps) and
> 5-20 writes per second (INSERT/DELETE/UPDATE/REPLACE). The "bucket" size
> is close to 1 million. I am handling through sphinx mainly the search part,
> with only some non-search queries been sent to Sphinx because would take
> more to run on mysql.
>
> The reads/searches are fine, average is 0.04/query. But currently I am
> having issues because of the way Sphinx is handling writes (it seems writes
> are waiting for all reads to complete), a write can even take 7 seconds to
> finish. Beside some momentary optimizations, as the traffic goes up I only
> have the solution to bring more capacity (which is not impossible but wont
> help so much based on my tests, the performance/box is decreasing as we add
> more boxes). So I am ending up checking for Riak for search part of the app.
>
> My questions:
> * what capacity/server specs could I start ? Can I start with just one box
> (the app will grow so definetly I will benefit from scaling features of
> Riak at later moment) for the above specs (200+ search queries, 5-20 write
> queries, 1 Million bucket size) ?
>
> * what response times should I expect for search requests ? How about the
> write requests ? I don't have lined up here the queries and the data so
> this is impossible to know but I have no idea now how Riak works on
> performance. As you see for the moment scaling is not my focus. I plan to
> use the bench tool to do some testing but some overall insights still are
> helping.
>
> * high offset (eg: list 10 items from a search located at offset 200,000)
> search requests how are expected to work ?
>
> Thank you!
> Cristian
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

-- 
*Todd Tyree*
Client Services Engineer Basho <http://www.basho.com/>

mobile: +44(0)7861 220 182
web: www.basho.com
github: tatyree <http://github.com/tatyree>

-- 
*Todd Tyree*
Client Services Engineer Basho <http://www.basho.com/>

mobile: +44(0)7861 220 182
web: www.basho.com
github: tatyree <http://github.com/tatyree>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Fwd: Evaluating Riak for search

Reply via email to