
I'm currently in the process of evaluating solutions to index the contents of 
~1TB of SEC (Securities and Exchange Commission) documents.  File sizes vary 
between a few KB to a couple hundred KB.  I started evaluating Riak first 
because ease of setting up and expanding a cluster are primary requirements 
(ElasticSearch is also probably going to get evaluated, along with Solr).  

Below I have a few specific questions that I was hoping people could help with:

        * In going through the search querying documentation, I haven't found a 
way to extract a section of a result containing matches.  Something similar to 
Google's search results page where you see an excerpt of the webpage contents 
that match your query.  Is something like this built-in so that it doesn't have 
to be done by the application?
        * Given that the documents total ~1TB of storage (not including the 
generated indexes), does something like decreasing the n_val make sense?  
Mostly the documents are bulk inserted on a daily or weekly basis – other than 
that all of the operations are read-only.

Other than these specific questions, if anyone can provide general insight on 
issues that would arise from a dataset like this within Riak, please feel free 
to mention them.



riak-users mailing list

Reply via email to