Hello,
I have a doubt about index size,
I am testing a program using Lucene to index some dataset.
At the final the result of index size is varying a little, since i haven't
finished the tests at all, i'm doubt if it is normal the index size vary on
size among different tests.
att.
On 14-03-24 11:26 AM, Mirko Sertic wrote:
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
The field can be stored anywhere, not necessarily in the index. Here is
something that might work:
1. Store the N first characters of your field in a database.
2. Override
We are using Lucene 3.6 to perform incremental indexing. We use an algorithm
we found on the web to perform the incremental indexing.
1. For each file that we indexed, we create a UID field to associate with
it. The UID is calculated using the file path and the last updated time.
2. When perfor
On Thu, Mar 20, 2014 at 8:47 AM, Shai Erera wrote:
>>
>> Even if the commit is called just before the close, the close triggers
>> a last commit.
>>
>
> That seems wrong. If you do writer.commit() and them immediately
> writer.close(), and there are no changes to the writer in between (i.e. a
> th
To expand on Herb's comment, in Lucene, the StandardAnalyzer will break CJK
into characters:
1 : 轻
2 : 歌
3 : 曼
4 : 舞
5 : 庆
6 : 元
7 : 旦
If you initialize the classic QueryParser with StandardAnalyzer, the parser
will use that Analyzer to break this string into individual characters as
above.
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
Regards
Mirko
Gesendet: Montag, 24. März 2014 um 16:01 Uhr
Von: "Uwe Schindler"
An: java-user@lucene.apache.org
Betreff: RE: Indexing and storing very large documents
Stored fields do not support Readers at the
Stored fields do not support Readers at the moment.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Mirko Sertic [mailto:mirko.ser...@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene
Hi there
I am searching for a way to store very large documents in a Lucene 4.7 index
and keep them ready to use the PostingsHighlighter for search result
highlighting.
I do not want to read the whole document into memory, as this would consume too
much memory or could cause an OutOHeapSpace
The default query parser for CJK languages breaks text into bigrams. A
word consisting of characters ABCDE is broken into tokens AB, BC, CD,
DE, or
"轻歌曼舞庆元旦"
into
data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦
Each pair may or may not be a word, but if you use the same parser (i.e.
analyz
Hi,
There was a response to your question (by Timothy Allison) but maybe
you didn't see it? Are you subscribed to the mailing list
(java-user@lucene.apache.org)?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Mar 24, 2014 at 2:21 AM, kalaik wrote:
> Dear Team,
>
> Any
10 matches
Mail list logo