Forgot to Reply-All. ---------- Forwarded message ---------- From: Todd Tyree <t...@basho.com> Date: Mon, Dec 16, 2013 at 10:43 AM Subject: Re: Evaluating Riak for search To: Cristian Bichis <cri...@imagis.ro>
Hi Cristian, Firstly, I recommend you look at our Yokozuna project [0]. This is a tight integration of Riak and SOLR that will be released as Riak Search 2.0. To answer your specific questions: > what capacity/server specs could I start ? Have a look at our cluster capacity planning guide for a start [1]. For a production environment, you will need a minimum of five nodes for optimal performance. Our key-value read and write operations scale linearly, so you can easily increase performance by adding nodes. Yokozuna search works using a coverage query, meaning that a quorum of nodes (ceil(number of nodes/2)) must respond for a query to be considered successful. Practically, this means that it does not scale linearly. I believe a standard five-node cluster will meet your search performance requirements, but you should do some benchmarking to ensure this is the case. Currently, Basho Bench does not support Yokozuna queries, so you will need to use another tool to perform these benchmarks. However, Yokozuna is compatible with all SOLR clients, so you should be able to use any SOLR benchmarking tool to test and optimise search performance. Minimum per-node hardware recommendations can be found in our "Planning for a Riak System" guide [2]. > Can I start with just one box (the app will grow so definetly I will benefit from scaling features of Riak at later moment) for the above specs (200+ search queries, 5-20 write queries, 1 Million bucket size) ? No, I'm afraid not. At a minimum you will need a five-node cluster. In addition to the information from docs.basho.com, you should also read our blog entry "Why Your Riak Cluster Should Have At Least Five Nodes" [3]. I recommend you read and apply all of the recommended settings found in the 'Tuning' section of our documentation [4][5][6]. > what response times should I expect for search requests ? How about the write requests ? I don't have lined up here the queries and the data so this is impossible to know but I have no idea now how Riak works on performance. As you see for the moment scaling is not my focus. I plan to use the bench tool to do some testing but some overall insights still are helping. This depends on your data and access patterns. As I said earlier, you should benchmark representative data. > high offset (eg: list 10 items from a search located at offset 200,000) search requests how are expected to work ? You should be able to use the standard SOLR 'start' and 'rows' query parameters [7] to setup the offset and the number of results returned. Faceting may also be appropriate depending on your data [8]. You may need to optimise your SOLR queries for efficiency for these kinds of operations. Without knowing more about your data, I'm afraid I cannot recommend a specific strategy. Please be aware that Yokozuna shares the same limitations as SOLR's Distributed Search [9]. For the broad scenario you are describing, I suggest you use the bitcask backend and start with a ring size of 128 or 256. You can do an easy proof-of-concept on AWS EC2 instances. Specific AWS tuning recommendations can be found on the 'AWS Performance Tuning' page [10]. We frequently use m1.xlarge instances for exactly this purpose. [0] https://github.com/basho/yokozuna [1] http://docs.basho.com/riak/latest/ops/building/planning/cluster/ [2] http://docs.basho.com/riak/latest/ops/building/planning/system-planning/#Hardware [3] http://basho.com/why-your-riak-cluster-should-have-at-least-five-nodes/ [4] http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/ [5] http://docs.basho.com/riak/latest/ops/tuning/file-system/ [6] http://docs.basho.com/riak/latest/ops/tuning/linux/ [7] http://wiki.apache.org/solr/CommonQueryParameters [8] http://wiki.apache.org/solr/SolrFacetingOverview [9] http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations [10] http://docs.basho.com/riak/latest/ops/tuning/aws/ On Sun, Dec 15, 2013 at 7:56 AM, Cristian Bichis <cri...@imagis.ro> wrote: > Hello, > > I have an application which is based on mysql + sphinx (for search) + PHP > + caching. I am running the mysql part on a multi server master-slave and > currently sphinx is over one box only. > > Currently I have performance issues with writes > (INSERT/DELETE/UPDATE/REPLACE) on Sphinx and I am looking for an > alternative on search part. I was checking by 1+ year for some alternative > solutions (when I didn't had current issues with sphinx) and on the short > list is Riak and Mark Logic. > > The app I am currently handing has about 130-170 search queries (SELECT > with full text) at peak time (but occasionally can go to 200-500 qps) and > 5-20 writes per second (INSERT/DELETE/UPDATE/REPLACE). The "bucket" size > is close to 1 million. I am handling through sphinx mainly the search part, > with only some non-search queries been sent to Sphinx because would take > more to run on mysql. > > The reads/searches are fine, average is 0.04/query. But currently I am > having issues because of the way Sphinx is handling writes (it seems writes > are waiting for all reads to complete), a write can even take 7 seconds to > finish. Beside some momentary optimizations, as the traffic goes up I only > have the solution to bring more capacity (which is not impossible but wont > help so much based on my tests, the performance/box is decreasing as we add > more boxes). So I am ending up checking for Riak for search part of the app. > > My questions: > * what capacity/server specs could I start ? Can I start with just one box > (the app will grow so definetly I will benefit from scaling features of > Riak at later moment) for the above specs (200+ search queries, 5-20 write > queries, 1 Million bucket size) ? > > * what response times should I expect for search requests ? How about the > write requests ? I don't have lined up here the queries and the data so > this is impossible to know but I have no idea now how Riak works on > performance. As you see for the moment scaling is not my focus. I plan to > use the bench tool to do some testing but some overall insights still are > helping. > > * high offset (eg: list 10 items from a search located at offset 200,000) > search requests how are expected to work ? > > Thank you! > Cristian > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- *Todd Tyree* Client Services Engineer Basho <http://www.basho.com/> mobile: +44(0)7861 220 182 web: www.basho.com github: tatyree <http://github.com/tatyree> -- *Todd Tyree* Client Services Engineer Basho <http://www.basho.com/> mobile: +44(0)7861 220 182 web: www.basho.com github: tatyree <http://github.com/tatyree>
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com