+1
And there was another issue in my indexing framework. I have the
LowercaseFilter in use, so the Term only matched if the value was all
lowercased ...
Thx
Clemens
-Ursprüngliche Nachricht-
Von: Michael McCandless [mailto:luc...@mikemccandless.com]
Gesendet: Donnerstag, 19. Juni 2014 1
Ok, makes sense. Thanks for the info!
On Thu, Jun 19, 2014 at 3:05 PM, Robert Muir wrote:
> Don't extend that: extend Similarity.
>
> Some of those implementations actually rely and optimize for the fact
> that its a byte and build lookup tables and so on.
>
> On Thu, Jun 19, 2014 at 6:03 PM, N
Don't extend that: extend Similarity.
Some of those implementations actually rely and optimize for the fact
that its a byte and build lookup tables and so on.
On Thu, Jun 19, 2014 at 6:03 PM, Nalini Kartha wrote:
> Sorry, I meant the encodeNormValue and decodeNormValue methods on the
> TFIDFSimi
Sorry, I meant the encodeNormValue and decodeNormValue methods on the
TFIDFSimilarity class -
public byte encodeNormValue(float f)
public float decodeNormValue(byte b)
On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir wrote:
> No they do not. The method is:
>
> public abstract long computeNorm(F
No they do not. The method is:
public abstract long computeNorm(FieldInvertState state);
On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha wrote:
> Thanks for the info!
>
> We're more interested in changing the lengthnorm function vs using
> additional stats for scoring so option 2 seems like t
Thanks for the info!
We're more interested in changing the lengthnorm function vs using
additional stats for scoring so option 2 seems like the right way.
It looks like the encode and decode methods deal with bytes right now -
would changing those APIs to deal with longs instead be a good idea? I
Hi,
You may not need to change the length-norm at all: If you want to support
*additional* statistics, add a docvalues field to your index where you can
store that information in addition to the Lucene-Default statistics. Based on a
function query you can then use it for scoring. In fact, you c
Hi,
We're interested in having access to the number of terms in the fields for
a document vs the pre-calculated lengthnorm at scoring time - we want
experiment with different lengthnorm functions so it seems like storing the
raw length and then doing the norm calculation at query time would work.
There is a bug in your test: you cannot use reader.maxDoc().
It's expected this would be 2 when (*) is commented out, because you
have 2 docs, one of which is deleted.
Use numDocs instead?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV wrote
directory = new SimpleFSDirectory( indexLocation );
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, new
WhitespaceAnalyzer( Version.LUCENE_47 ));
indexWriter = new IndexWriter( directory, config );
Document doc = new Document();
String value = "hello";
String key = "test";
doc.
Unfortunately I spoke too soon. While the original example seems to have
been fixed, I'm still getting some unexpected results.
As per your suggestion, I modified the Analyzer to:
@Override
protected TokenStreamComponents createComponents(String field, Reader
in) {
NormalizeCharMa
Hi Simon, guys,
I see LUCENE-5038, useCompoundFile stuff had been refactored. Now I think
there are some problems with LogMergePolicy.
Example:
1. setting useCompoundFile as false and no changing NOCFSRatio(1.0 by
default).
2. starting index, new segment will not use compound file even it's small
12 matches
Mail list logo