AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
+1 And there was another issue in my indexing framework. I have the LowercaseFilter in use, so the Term only matched if the value was all lowercased ... Thx Clemens -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Donnerstag, 19. Juni 2014 1

Re: Changing field lengthnorm to store length

2014-06-19 Thread Nalini Kartha
Ok, makes sense. Thanks for the info! On Thu, Jun 19, 2014 at 3:05 PM, Robert Muir wrote: > Don't extend that: extend Similarity. > > Some of those implementations actually rely and optimize for the fact > that its a byte and build lookup tables and so on. > > On Thu, Jun 19, 2014 at 6:03 PM, N

Re: Changing field lengthnorm to store length

2014-06-19 Thread Robert Muir
Don't extend that: extend Similarity. Some of those implementations actually rely and optimize for the fact that its a byte and build lookup tables and so on. On Thu, Jun 19, 2014 at 6:03 PM, Nalini Kartha wrote: > Sorry, I meant the encodeNormValue and decodeNormValue methods on the > TFIDFSimi

Re: Changing field lengthnorm to store length

2014-06-19 Thread Nalini Kartha
Sorry, I meant the encodeNormValue and decodeNormValue methods on the TFIDFSimilarity class - public byte encodeNormValue(float f) public float decodeNormValue(byte b) On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir wrote: > No they do not. The method is: > > public abstract long computeNorm(F

Re: Changing field lengthnorm to store length

2014-06-19 Thread Robert Muir
No they do not. The method is: public abstract long computeNorm(FieldInvertState state); On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha wrote: > Thanks for the info! > > We're more interested in changing the lengthnorm function vs using > additional stats for scoring so option 2 seems like t

Re: Changing field lengthnorm to store length

2014-06-19 Thread Nalini Kartha
Thanks for the info! We're more interested in changing the lengthnorm function vs using additional stats for scoring so option 2 seems like the right way. It looks like the encode and decode methods deal with bytes right now - would changing those APIs to deal with longs instead be a good idea? I

RE: Changing field lengthnorm to store length

2014-06-19 Thread Uwe Schindler
Hi, You may not need to change the length-norm at all: If you want to support *additional* statistics, add a docvalues field to your index where you can store that information in addition to the Lucene-Default statistics. Based on a function query you can then use it for scoring. In fact, you c

Changing field lengthnorm to store length

2014-06-19 Thread Nalini Kartha
Hi, We're interested in having access to the number of terms in the fields for a document vs the pre-calculated lengthnorm at scoring time - we want experiment with different lengthnorm functions so it seems like storing the raw length and then doing the norm calculation at query time would work.

Re: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Michael McCandless
There is a bug in your test: you cannot use reader.maxDoc(). It's expected this would be 2 when (*) is commented out, because you have 2 docs, one of which is deleted. Use numDocs instead? Mike McCandless http://blog.mikemccandless.com On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV wrote

AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
directory = new SimpleFSDirectory( indexLocation ); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new WhitespaceAnalyzer( Version.LUCENE_47 )); indexWriter = new IndexWriter( directory, config ); Document doc = new Document(); String value = "hello"; String key = "test"; doc.

Re: Lucene QueryParser/Analyzer inconsistency

2014-06-19 Thread Luis Pureza
Unfortunately I spoke too soon. While the original example seems to have been fixed, I'm still getting some unexpected results. As per your suggestion, I modified the Analyzer to: @Override protected TokenStreamComponents createComponents(String field, Reader in) { NormalizeCharMa

Inconsistency of LogMergePolicy and IWC.useCompoundFile

2014-06-19 Thread Duke DAI
Hi Simon, guys, I see LUCENE-5038, useCompoundFile stuff had been refactored. Now I think there are some problems with LogMergePolicy. Example: 1. setting useCompoundFile as false and no changing NOCFSRatio(1.0 by default). 2. starting index, new segment will not use compound file even it's small