Hello, My Riak search cluster is timing out very often. I am indexing text content extracted from web pages containing news articles. My articles range in size from a few KB to tens of KB. I have put about 4.4 million articles into Riak for an average article size of 15 KB. The keys are MD5 ASCII hex hashes and the values are JSON. When I set this system up I loaded it with 1GB or so of data and played with the search system. Everything was kosher, it responded quickly and the search relevance was fine. Now that I've imported 100x as much data I am getting timeouts. For example the query "steve jobs died" times out. When I put in extremely specific conjunctive queries like "+steve +jobs +died +cupertino +apple" I get no results but it runs quickly. While the system is running a query that will time out I see the coordinator Riak node consuming between one and two cores worth of CPU.
How can I configure Riak to stop timing out searches? I am open to changing my schema and query pattern if that's what I need to do. app.config - https://gist.github.com/1352608 schema - https://gist.github.com/1352616 selected errors - https://gist.github.com/1c0976ced0f05ef0d5d6 Nodes in the cluster: 4 Hardware: EC2 m1.large with two disks in a RAID-0 on /mnt Operating system: Linux ip-XXXX 2.6.38-11-virtual #50-Ubuntu SMP Mon Sep 12 21:51:23 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Disk space consumed: 66G /mnt/riak/leveldb 36G /mnt/riak/merge_index Disk space available: 800G Spike Gronim sp...@wavii.com<mailto:sp...@wavii.com>
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com