Retrieving the index format

2007-03-30 Thread Dan Climan
Is there a way to retrieve the tell which format an index is in? The file formats documentation http://lucene.apache.org/java/docs/fileformats.html#Segments%20File indicates that the segments file stores a Format value that can be used to determine the type. Format is -1 as of Lucene 1.4 and

Re: Modifying the stored norm type

2006-06-20 Thread Dan Climan
>Paul Elschot <[EMAIL PROTECTED]> >>On Tuesday 20 June 2006 12:02, Marcus Falck wrote: >> After a lot of debugging and some API doc reading I have come to the > conclusion that the static encodeNorm method of the Similarity class > will encode my boost value into a single byte decimal number. >>

Impact of Term Vectors (was ApacheCon next week)

2005-12-13 Thread Dan Climan
Good question. I was wondering about the impact of adding term vectors with the various options. For example, is adding term vectors with both positions and offsets a significant impact? Which current parts of lucene (including contributions) take advantage of term vectors being present? I know tha

Highlighter, Term Positions and Stopwords

2005-12-05 Thread Dan Climan
Do stopfilters create non-contiguous token positions? I was interested in experimenting with the highlighter and using the TokenSources.getTokenStream(TermPositionVector tpv, boolean tokenPositionsGuaranteedContiguous) method The javadocs for this method

Document visible by Term, but not search

2005-08-24 Thread Dan Climan
I have the following strange behavior for an index. The index has been optimized and has no deletions. It's in compound file format. Using Luke 0.6 I can browse by Term and find my term (ItemId:727680). It's a Keyword field. It shows a docfreq of this term is 1. It also shows all the document fie

Deleting duplicates from a Lucene index

2005-05-26 Thread Dan Climan
} else { break; } } te.close(); ir.close(); //System.out.println("Number of ItemId Terms: " + numTerms); } catch(Exception e) { System.err.print("Except