We have a document tagging system where documents are composed of two
types of data:
Rarely changed (hereafter: "immutable") data - document text and
metadata that we upload and almost never change. The text can be
hundreds of pages.
User created (hereafter: "mutable") data - document properties
Yeah, biggest issue for us is we're using the SolrCloud features.
While I see some good things related to the Lucene and Solr code bases
being merged, this is certainly a frustrating aspect of it as I don't
require some of the changes that are in Lucene 4.0 (withstanding
anything that SolrCloud re
My personal view, as a bystander with no more information than you, is
that one has to assume there will be further index format changes before
a 4.0 release. This is based on the number of changes in the last 9
months, and the amount of activity on the dev list.
For us the implication is we
We have an application where every term position in a document is associated
with an "engine score".
A term query should then be scored according to the sum of "engine scores" of
the term in a document, rather than on the term frequency.
For example, term frequency of 5 with an average engine sco
On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin wrote:
> I need to implement a "quick and dirty" or "poor man's" translation of a
> foreign language document by looking up each word in a dictionary and
> replacing it with the English translation. So what I need is to tokenize
> the original foreign te
I have that use-case too: lots of indexes and each request is handled
by only one well-known index. For us it's working very well (but our
indexes are *small*- 1k-10k entries).
What we do is open/close the index reader / writer each time it's
needed, and reuse it if two requests need to access the
Hi Danil,
Thank you for answering once again.
You are right that we always know the file we are searching, the file location
is stored in a database.
Having done some testing, it seems to me that use index/file yields reasonable
performance just like you suggested.
For a 500K docs/index, I