RE: MoreLikeThis Interface changes

Scott Smith Mon, 26 Sep 2011 11:07:20 -0700

So, I thought you're response meant that I could eliminate my code:

        String[] fields = new String[1];
        fields[0] = "EVERYTHING";         // use the single "big" field in the 
index
        mlt.setFieldNames(fields);

But, if I comment out that code, my unit test fails.  If I include it, it 
passes.

I'm using MLT as follows:

            _query = new BooleanClause(mlt.like(new InputStreamReader(is), 
"EVERYTHING"), BooleanClause.Occur.MUST);

"is" is the input stream.  Did I miss something in your response?

Scott

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Wednesday, September 21, 2011 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: MoreLikeThis Interface changes

On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith <ssm...@mainstreamdata.com> wrote:
> I'm updating my lucene code from 3.0 to 3.4.  There's a change in the MLT 
> interface I'm confused about.  I used the MLT.like(InputStream) method.  It 
> now appears I should change to the MLT.like(InputStreamReader, fieldname) 
> method.  Easy enough to create an InputStreamReader from an InputStream.

Yes, requiring a reader is to ensure that MLT is using the encoding you want

>
> So, my question is regarding the addition of the fieldname parameter.  
> There's also a call called MLT.setFieldNames(String[]).  This would seem to 
> be redundant except the setFieldNames() allows you to specify multiple fields 
> and like() doesn't.  Am I allowed to specify null as the fieldname in like() 
> (documentation doesn't say you can).  It seems like you shouldn't need to do 
> both.  But there's a difference in functionality between the two (since one 
> allows multiple fields and the other doesn't).

A Reader has no fields :)
The fieldName is only for passing to the Analyzer (@param fieldName
field passed to the analyzer to use when analyzing the content)
This is because some Analyzers (e.g. PerFieldAnalyzerWrapper) analyze
content differently according to different fields.

Previously, MoreLikeThis would use what was in the setFieldNames
parameter, iteratively like this:
for (field : fieldNames) {
  analyzer.analyze(field, reader);
}

However, MoreLikeThis also had a bug where it would never close() the
reader As you can see this logic was completely bogus, as you can only
consume the field once.

Effectively the reader would be analyzed by fieldNames[0], then MLT
would analyze an exhausted reader with fieldNames[1]...fieldNames[n].

When we fixed MLT to close its resources correctly (around 3.2), it
exposed this second bug, If you tried to pass a reader with multiple
values in fieldNames you would get an IOException because it tried to
re-consume a closed reader.

Now, instead when supplying a reader, you should pass in this
fieldName explicitly so that it analyzes the content the way you want.
For backwards compatibility with the deprecated method, it uses
fieldNames[0] only.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: MoreLikeThis Interface changes

Reply via email to