Split mutable logical document into two Lucene documents

2011-12-07 Thread Brandon Mintern
We have a document tagging system where documents are composed of two types of data: Rarely changed (hereafter: "immutable") data - document text and metadata that we upload and almost never change. The text can be hundreds of pages. User created (hereafter: "mutable") data - document properties

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-07 Thread Jamie Johnson
Yeah, biggest issue for us is we're using the SolrCloud features. While I see some good things related to the Lucene and Solr code bases being merged, this is certainly a frustrating aspect of it as I don't require some of the changes that are in Lucene 4.0 (withstanding anything that SolrCloud re

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-07 Thread Mike Sokolov
My personal view, as a bystander with no more information than you, is that one has to assume there will be further index format changes before a 4.0 release. This is based on the number of changes in the last 9 months, and the amount of activity on the dev list. For us the implication is we

Score per position

2011-12-07 Thread arnon ma
We have an application where every term position in a document is associated with an "engine score". A term query should then be scored according to the sum of "engine scores" of the term in a document, rather than on the term frequency. For example, term frequency of 5 with an average engine sco

Re: tokenizing text using language analyzer but preserving stopwords if possible

2011-12-07 Thread Avi Rosenschein
On Wed, Dec 7, 2011 at 00:41, Ilya Zavorin wrote: > I need to implement a "quick and dirty" or "poor man's" translation of a > foreign language document by looking up each word in a dictionary and > replacing it with the English translation. So what I need is to tokenize > the original foreign te

Re: Use multiple lucene indices

2011-12-07 Thread Francisco A. Lozano
I have that use-case too: lots of indexes and each request is handled by only one well-known index. For us it's working very well (but our indexes are *small*- 1k-10k entries). What we do is open/close the index reader / writer each time it's needed, and reuse it if two requests need to access the

Re: Use multiple lucene indices

2011-12-07 Thread Rui Wang
Hi Danil, Thank you for answering once again. You are right that we always know the file we are searching, the file location is stored in a database. Having done some testing, it seems to me that use index/file yields reasonable performance just like you suggested. For a 500K docs/index, I