Hello,

My Riak search cluster is timing out very often. I am indexing text content 
extracted from web pages containing news articles. My articles range in size 
from a few KB to tens of KB.  I have put about 4.4 million articles into Riak 
for an average article size of 15 KB. The keys are MD5 ASCII hex hashes and the 
values are JSON. When I set this system up I loaded it with 1GB or so of data 
and played with the search system. Everything was kosher, it responded quickly 
and the search relevance was fine. Now that I've imported 100x as much data I 
am getting timeouts. For example the query "steve jobs died" times out. When I 
put in extremely specific conjunctive queries like "+steve +jobs +died 
+cupertino +apple" I get no results but it runs quickly. While the system is 
running a query that will time out I see the coordinator Riak node consuming 
between one and two cores worth of CPU.

How can I configure Riak to stop timing out searches? I am open to changing my 
schema and query pattern if that's what I need to do.

app.config - https://gist.github.com/1352608
schema - https://gist.github.com/1352616
selected errors - https://gist.github.com/1c0976ced0f05ef0d5d6

Nodes in the cluster: 4
Hardware: EC2 m1.large with two disks in a RAID-0 on /mnt
Operating system: Linux ip-XXXX 2.6.38-11-virtual #50-Ubuntu SMP Mon Sep 12 
21:51:23 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Disk space consumed:

66G  /mnt/riak/leveldb
36G  /mnt/riak/merge_index

Disk space available: 800G


Spike Gronim
sp...@wavii.com<mailto:sp...@wavii.com>



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to