I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).
Many of our indexes are 5M+ Documents, however, only a small subset of these are
relevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, is
rather inefficient in terms of memory use, what is the recommended
I am exploring alternative field compression schemes, designed to perform more
effectively on small documents. In particular to be more effective compressing
stored fields that show repetitive structure across fields, but not necessarily
within a field. I have been working with a significant i
Ok Erick,
Thanks for your quick answer.
FSDirectory will, indeed, store the index on disk. However,
when *using* that index, lots of stuff happens. Specifically:
When indexing, there is a buffer that accumulates documents
until it's flushed to disk. Are you indexing?
When searching (and this
I need to add synonyms to an index depending on the field being indexed.
I know that TokenFilter is not "field aware", but is there a good way to
get at the field or do I need to add something to allow my Analyzer to
tell the TokenFilter which field is currently being examined?
Thanks,
-Chris
---
On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylor wrote:
> On 04/04/2011 20:13, Michael McCandless wrote:
>>
>> How are you merging these indices? (IW.addIndexes?).
>>
>> Are you changing any of IW's defaults, eg mergeFactor?
>>
>> Mike
>
> Hi Mike
>
> I have
>
> indexWriter.setMaxBufferedDocs(1);
On 04/04/2011 20:13, Michael McCandless wrote:
How are you merging these indices? (IW.addIndexes?).
Are you changing any of IW's defaults, eg mergeFactor?
Mike
Hi Mike
I have
indexWriter.setMaxBufferedDocs(1);
indexWriter.setMergeFactor(3000);
these are a hangover from earlier code, I
How are you merging these indices? (IW.addIndexes?).
Are you changing any of IW's defaults, eg mergeFactor?
Mike
http://blog.mikemccandless.com
On Mon, Apr 4, 2011 at 3:05 PM, Paul Taylor wrote:
> Problem trying to merge indexes in the background whilst building some
> others, works okay on m
Problem trying to merge indexes in the background whilst building some others,
works okay on my humble labtop but fails on another machine, although it seems
to allow 700,000 file handles
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException: java.io.F
Thank you, Yonnik for this hint. (Again, I wasn't aware that obviousely
Solr offers useful extensions for the Lucene indexing process and I
wonder why they haven't been added to Lucene itself.)
Anyway, since the HyphenatedWordsFilter needs newlines in the input I
will have to take another Toke
FSDirectory will, indeed, store the index on disk. However,
when *using* that index, lots of stuff happens. Specifically:
When indexing, there is a buffer that accumulates documents
until it's flushed to disk. Are you indexing?
When searching (and this is the more important part), various
caches a
Ok, I've now seen RAMDirectory class instead and I'm using it together what
the IndexWriter... it should be ok now thanks
On 4 April 2011 13:10, Patrick Diviacco wrote:
> ok Thanks,
>
> When I use IndexWriter, I call addDocument method to add a new instance to
> the index.
>
> addDocument takes
Since I need to overwrite an old ramDirectory file and I don't want memory
leaks, I have the following code lines to close first the existing
RAMDirectory and create a new one.
INDEX_DIR.close();
INDEX_DIR = new RAMDirectory();
However, I get the following exception. Should I remove close() line
Hi,
I am using Lucene 2.9.4 with FSDirectory.
My index has 80 thousand documents (each document has 12 fields).
My jvm has 70Mb of RAM memory (limited by my hosting).
I am getting various OutOfMemoryError.
I ran jmap and I got:
num #instances#bytesClass description
ok Thanks,
When I use IndexWriter, I call addDocument method to add a new instance to
the index.
addDocument takes
Document doc, Analyzer analyzer
In MemoryIndex, I have addField which wants:
String fieldName, String text, Analyzer analyzer
Not sure how should I pass the doc, should I get the f
14 matches
Mail list logo