DocIdSet to represent small numberr of hits in large Document set

2011-04-04 Thread Antony Bowesman
I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4). Many of our indexes are 5M+ Documents, however, only a small subset of these are relevant to any user. As a DocIdSet, backed by a BitSet or OpenBitSet, is rather inefficient in terms of memory use, what is the recommended

Alternative field compression scheme

2011-04-04 Thread Garrick Toubassi
I am exploring alternative field compression schemes, designed to perform more effectively on small documents. In particular to be more effective compressing stored fields that show repetitive structure across fields, but not necessarily within a field. I have been working with a significant i

Re: OutOfMemoryError with FSDirectory

2011-04-04 Thread Claudio
Ok Erick, Thanks for your quick answer. FSDirectory will, indeed, store the index on disk. However, when *using* that index, lots of stuff happens. Specifically: When indexing, there is a buffer that accumulates documents until it's flushed to disk. Are you indexing? When searching (and this

Field Aware TokenFilter

2011-04-04 Thread Christopher Condit
I need to add synonyms to an index depending on the field being indexed. I know that TokenFilter is not "field aware", but is there a good way to get at the field or do I need to add something to allow my Analyzer to tell the TokenFilter which field is currently being examined? Thanks, -Chris ---

Re: Lucene Merge failing on Open Files

2011-04-04 Thread Simon Willnauer
On Mon, Apr 4, 2011 at 9:59 PM, Paul Taylor wrote: > On 04/04/2011 20:13, Michael McCandless wrote: >> >> How are you merging these indices?  (IW.addIndexes?). >> >> Are you changing any of IW's defaults, eg mergeFactor? >> >> Mike > > Hi Mike > > I have > > indexWriter.setMaxBufferedDocs(1);

Re: Lucene Merge failing on Open Files

2011-04-04 Thread Paul Taylor
On 04/04/2011 20:13, Michael McCandless wrote: How are you merging these indices? (IW.addIndexes?). Are you changing any of IW's defaults, eg mergeFactor? Mike Hi Mike I have indexWriter.setMaxBufferedDocs(1); indexWriter.setMergeFactor(3000); these are a hangover from earlier code, I

Re: Lucene Merge failing on Open Files

2011-04-04 Thread Michael McCandless
How are you merging these indices? (IW.addIndexes?). Are you changing any of IW's defaults, eg mergeFactor? Mike http://blog.mikemccandless.com On Mon, Apr 4, 2011 at 3:05 PM, Paul Taylor wrote: > Problem trying to merge indexes in the background whilst building some > others, works okay on m

Lucene Merge failing on Open Files

2011-04-04 Thread Paul Taylor
Problem trying to merge indexes in the background whilst building some others, works okay on my humble labtop but fails on another machine, although it seems to allow 700,000 file handles Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.F

Re: Undo hyphenation when indexing

2011-04-04 Thread Wulf Berschin
Thank you, Yonnik for this hint. (Again, I wasn't aware that obviousely Solr offers useful extensions for the Lucene indexing process and I wonder why they haven't been added to Lucene itself.) Anyway, since the HyphenatedWordsFilter needs newlines in the input I will have to take another Toke

Re: OutOfMemoryError with FSDirectory

2011-04-04 Thread Erick Erickson
FSDirectory will, indeed, store the index on disk. However, when *using* that index, lots of stuff happens. Specifically: When indexing, there is a buffer that accumulates documents until it's flushed to disk. Are you indexing? When searching (and this is the more important part), various caches a

Re: indexing data without writing to disk ?

2011-04-04 Thread Patrick Diviacco
Ok, I've now seen RAMDirectory class instead and I'm using it together what the IndexWriter... it should be ok now thanks On 4 April 2011 13:10, Patrick Diviacco wrote: > ok Thanks, > > When I use IndexWriter, I call addDocument method to add a new instance to > the index. > > addDocument takes

how to delete a RAMDirectory from memory

2011-04-04 Thread Patrick Diviacco
Since I need to overwrite an old ramDirectory file and I don't want memory leaks, I have the following code lines to close first the existing RAMDirectory and create a new one. INDEX_DIR.close(); INDEX_DIR = new RAMDirectory(); However, I get the following exception. Should I remove close() line

OutOfMemoryError with FSDirectory

2011-04-04 Thread Claudio
Hi, I am using Lucene 2.9.4 with FSDirectory. My index has 80 thousand documents (each document has 12 fields). My jvm has 70Mb of RAM memory (limited by my hosting). I am getting various OutOfMemoryError. I ran jmap and I got: num #instances#bytesClass description

Re: indexing data without writing to disk ?

2011-04-04 Thread Patrick Diviacco
ok Thanks, When I use IndexWriter, I call addDocument method to add a new instance to the index. addDocument takes Document doc, Analyzer analyzer In MemoryIndex, I have addField which wants: String fieldName, String text, Analyzer analyzer Not sure how should I pass the doc, should I get the f