Index size for Same DataSet.

2014-03-24 Thread Jose Carlos Canova
Hello, I have a doubt about index size, I am testing a program using Lucene to index some dataset. At the final the result of index size is varying a little, since i haven't finished the tests at all, i'm doubt if it is normal the index size vary on size among different tests. att.

Re: Aw: RE: Indexing and storing very large documents

2014-03-24 Thread Alexandre Patry
On 14-03-24 11:26 AM, Mirko Sertic wrote: Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right? The field can be stored anywhere, not necessarily in the index. Here is something that might work: 1. Store the N first characters of your field in a database. 2. Override

Incremental Indexing in Lucene 4.7

2014-03-24 Thread Yuan
We are using Lucene 3.6 to perform incremental indexing. We use an algorithm we found on the web to perform the incremental indexing. 1. For each file that we indexed, we create a UID field to associate with it. The UID is calculated using the file path and the last updated time. 2. When perfor

Re: Replicator: how to use it?

2014-03-24 Thread Roberto Franchini
On Thu, Mar 20, 2014 at 8:47 AM, Shai Erera wrote: >> >> Even if the commit is called just before the close, the close triggers >> a last commit. >> > > That seems wrong. If you do writer.commit() and them immediately > writer.close(), and there are no changes to the writer in between (i.e. a > th

RE: QueryParser

2014-03-24 Thread Allison, Timothy B.
To expand on Herb's comment, in Lucene, the StandardAnalyzer will break CJK into characters: 1 : 轻 2 : 歌 3 : 曼 4 : 舞 5 : 庆 6 : 元 7 : 旦 If you initialize the classic QueryParser with StandardAnalyzer, the parser will use that Analyzer to break this string into individual characters as above.

Aw: RE: Indexing and storing very large documents

2014-03-24 Thread Mirko Sertic
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right? Regards Mirko     Gesendet: Montag, 24. März 2014 um 16:01 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: Indexing and storing very large documents Stored fields do not support Readers at the

RE: Indexing and storing very large documents

2014-03-24 Thread Uwe Schindler
Stored fields do not support Readers at the moment. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mirko Sertic [mailto:mirko.ser...@web.de] > Sent: Monday, March 24, 2014 3:03 PM > To: java-user@lucene

Indexing and storing very large documents

2014-03-24 Thread Mirko Sertic
Hi there I am searching for a way to store very large documents in a Lucene 4.7 index and keep them ready to use the PostingsHighlighter for search result highlighting. I do not want to read the whole document into memory, as this would consume too much memory or could cause an OutOHeapSpace

Re: QueryParser

2014-03-24 Thread Herb Roitblat
The default query parser for CJK languages breaks text into bigrams. A word consisting of characters ABCDE is broken into tokens AB, BC, CD, DE, or "轻歌曼舞庆元旦" into data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦 Each pair may or may not be a word, but if you use the same parser (i.e. analyz

Re: QueryParser

2014-03-24 Thread Michael McCandless
Hi, There was a response to your question (by Timothy Allison) but maybe you didn't see it? Are you subscribed to the mailing list (java-user@lucene.apache.org)? Mike McCandless http://blog.mikemccandless.com On Mon, Mar 24, 2014 at 2:21 AM, kalaik wrote: > Dear Team, > > Any