I have that use-case too: lots of indexes and each request is handled
by only one well-known index. For us it's working very well (but our
indexes are *small*- 1k-10k entries).
What we do is open/close the index reader / writer each time it's
needed, and reuse it if two requests need to access the
Hi Danil,
Thank you for answering once again.
You are right that we always know the file we are searching, the file location
is stored in a database.
Having done some testing, it seems to me that use index/file yields reasonable
performance just like you suggested.
For a 500K docs/index, I
10B documents is a lot of data.
Index/file won't scale: you will not be able to open all the indexes in the
same time (filehandlers limits, memory limits, etc), and if you'll
search through them sequentially, it will take a lot of time.
Unless in your usecase you always know the file you are sear
Hi Danil,
Thank you for your suggestions.
We will have approximately half million documents per file, so using your
calculation, 2 files * 50 = 10, 000, 000, 000. And we are likely to get
more files in the future, so a scalable solution is most desirable.
The document IDs are not uniq
How many documents there are in the system ?
approximate it by: 2 files * avg(docs/file)
>From my understanding your queries will be just lookup for a document ID
(Q: are those IDs unique between files? or you need to filter by filename?)
If that will be the only usecase than maybe you should
Hi Guys,
Thank you very much for your answers.
I will do some profiling on memory usage, but is there any documentation on how
Lucene uses/allocates the memory?
Best wishes,
Rui Wang
On 6 Dec 2011, at 06:11, KARTHIK SHIVAKUMAR wrote:
> hi
>
>>> would the memory usage go through the roof?
hi
>> would the memory usage go through the roof?
Yup
My past experience got me pickels in there...
with regards
karthik
On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang wrote:
> Hi All,
>
> We are planning to use lucene in our project, but not entirely sure about
> some of the design decis