Hi, 

 

First, there is also a migrate guide next to the changes log: 
http://lucene.apache.org/core/4_10_3/MIGRATE.html

 

1. If you implement analyzer, you have to override createComponents() which 
return TokenStreamComponents objects. See other Analyzer’s source code to 
understand how to use it. One simple example is in the Javadocs: 
http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/Analyzer.html

 

2. Use initReader() to wrap filters around readers. This class is protected and 
can be overridden. CharFilter implements Reader, so you can wrap any CharFilter 
there. Your HTMLStripCharsFilter have to wrapped around the given reader here.

 

3./4. Term vectors are different in Lucene 4. Basically term vectors are a 
small index for each document. And this is how its implemented. You get back a 
Fields/Terms instances, which are basically like AtomicReader’s backend – you 
can even execute a Query on the vectors:

IndexReader#getTermVector() returns Terms for a specific field:

<http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVector(int,%20java.lang.String)>

For all Fields (harder to use, unwrapping for a specific field is done above – 
this one is more to execute Querys and so on):

<http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVectors(int)>

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Martin Wunderlich [mailto:martin.wunderl...@gmx.net] 
Sent: Sunday, January 11, 2015 9:18 AM
To: java-user@lucene.apache.org
Subject: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

 

Hi all, 

 

I am currently in the process of upgrading a search engine application from 
Lucene 3.5.0 to version 4.10.3. There have been some substantial API changes in 
version 4 that break backward compatibility. I have managed to fix most of 
them, but a few issues remain that I could use some help with:

1.      "cannot override final method from Analyzer"

The original code extended the Analyzer class and the overrode 
tokenStream(...). 

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
    CharStream charStream = CharReader.get(reader);        
    return
        new LowerCaseFilter(version,
            new SeparationFilter(version,
                new WhitespaceTokenizer(version,
                    new HTMLStripFilter(charStream))));
}

But this method is final now and I am not sure how to understand the following 
note from the change log: 

"ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer 
implementations must now use Analyzer.TokenStreamComponents, rather than 
overriding .tokenStream() and .reusableTokenStream() (which are now final). "

There is another problem in the method quoted above: 

2.      "The method get(Reader) is undefined for the type CharReader"

There seem to have been some considerable changes here, too. 

3.      "TermPositionVector cannot be resolved to a type"

This class is gone now in Lucene 4. Are there any simple fixes for this? >From 
the change log: "The term vectors APIs (TermFreqVector, TermPositionVector, 
TermVectorMapper) have been removed in favor of the above flexible indexing 
APIs, presenting a single-document inverted index of the document from the term 
vectors."

Probably related to this: 4. "The method getTermFreqVector(int, String) is 
undefined for the type IndexReader."

Both problems occur here, for instance: 

TermPositionVector termVector = (TermPositionVector) 
reader.getTermFreqVector(...);

("reader" is of Type IndexReader)

I would appreciate any help with these issues. Thanks a lot in advance.

Cheers, 

Martin

 

PS: FYI, I have posted the same question on Stackoverflow: 
http://stackoverflow.com/questions/27881296/upgrading-lucene-from-3-5-to-4-10-how-to-handle-java-api-changes?noredirect=1#comment44166161_27881296

Reply via email to