Re: Use multiple lucene indices

Rui Wang Tue, 06 Dec 2011 01:03:58 -0800

Hi Guys,

Thank you very much for your answers.


I will do some profiling on memory usage, but is there any documentation on how 
Lucene uses/allocates the memory? 

Best wishes,
Rui Wang


On 6 Dec 2011, at 06:11, KARTHIK SHIVAKUMAR wrote:

> hi
> 
>>> would the memory usage go through the roof?
> 
> Yup ....
> 
> My past experience got me pickels  in there...
> 
> 
> 
> with regards
> karthik
> 
> On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang <[email protected]> wrote:
> 
>> Hi All,
>> 
>> We are planning to use lucene in our project, but not entirely sure about
>> some of the design decisions were made. Below are the details, any
>> comments/suggestions are more than welcome.
>> 
>> The requirements of the project are below:
>> 
>> 1. We have  tens of thousands of files, their size ranging from 500M to a
>> few terabytes, and majority of the contents in these files will not be
>> accessed frequently.
>> 
>> 2. We are planning to keep less accessed contents outside of our database,
>> store them on the file system.
>> 
>> 3. We also have code to get the binary position of these contents in the
>> files. Using these binary positions, we can quickly retrieve the contents
>> and convert them into our domain objects.
>> 
>> We think Lucene provides a scalable solution for storing and indexing
>> these binary positions, so the idea is that each piece of the content in
>> the files will a document, each document will have at least an ID field to
>> identify to content and a binary position field contains the starting and
>> stop position of the content. Having done some performance testing, it
>> seems to us that Lucene is well capable of doing this.
>> 
>> At the moment, we are planning to create one Lucene index per file, so if
>> we have new files to be added to the system, we can simply generate a new
>> index. The problem is do with searching, this approach means that we need
>> to create an new IndexSearcher every time a file is accessed through our
>> web service. We knew that it is rather expensive to open a new
>> IndexSearcher, and are thinking of using some kind of pooling mechanism.
>> Our questions are:
>> 
>> 1. Is this one index per file approach a viable solution? What do you
>> think about pooling IndexSearcher?
>> 
>> 2. If we have many IndexSearchers opened at the same time, would the
>> memory usage go through the roof? I couldn't find any document on how
>> Lucene use allocate memory.
>> 
>> Thank you very much for your help.
>> 
>> Many thanks,
>> Rui Wang
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
> 
> -- 
> *N.S.KARTHIK
> R.M.S.COLONY
> BEHIND BANK OF INDIA
> R.M.V 2ND STAGE
> BANGALORE
> 560094*


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Use multiple lucene indices

Reply via email to