Re: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

Martin Wunderlich Sun, 11 Jan 2015 12:26:45 -0800

Hi Uwe, 

Thanks a lot for the detailed reply. I'll see how far I get with it, but being 
quite new to Lucene, it seems I am lacking a bit of background information to 
fully understand the response below. In particular, I need to do some 
background reading on how token streams and readers work, I guess.


Cheers, 

Martin
 

Am 11.01.2015 um 11:05 schrieb Uwe Schindler <u...@thetaphi.de>:

> Hi, 
> 
> 
> 
> First, there is also a migrate guide next to the changes log: 
> http://lucene.apache.org/core/4_10_3/MIGRATE.html
> 
> 
> 
> 1. If you implement analyzer, you have to override createComponents() which 
> return TokenStreamComponents objects. See other Analyzer’s source code to 
> understand how to use it. One simple example is in the Javadocs: 
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/Analyzer.html
> 
> 
> 
> 2. Use initReader() to wrap filters around readers. This class is protected 
> and can be overridden. CharFilter implements Reader, so you can wrap any 
> CharFilter there. Your HTMLStripCharsFilter have to wrapped around the given 
> reader here.
> 
> 
> 
> 3./4. Term vectors are different in Lucene 4. Basically term vectors are a 
> small index for each document. And this is how its implemented. You get back 
> a Fields/Terms instances, which are basically like AtomicReader’s backend – 
> you can even execute a Query on the vectors:
> 
> IndexReader#getTermVector() returns Terms for a specific field:
> 
> <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVector(int,%20java.lang.String)>
> 
> For all Fields (harder to use, unwrapping for a specific field is done above 
> – this one is more to execute Querys and so on):
> 
> <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVectors(int)>
> 
> 
> 
> Uwe
> 
> 
> 
> -----
> 
> Uwe Schindler
> 
> H.-H.-Meier-Allee 63, D-28213 Bremen
> 
> <http://www.thetaphi.de/> http://www.thetaphi.de
> 
> eMail: u...@thetaphi.de
> 
> 
> 
> From: Martin Wunderlich [mailto:martin.wunderl...@gmx.net] 
> Sent: Sunday, January 11, 2015 9:18 AM
> To: java-user@lucene.apache.org
> Subject: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes
> 
> 
> 
> Hi all, 
> 
> 
> 
> I am currently in the process of upgrading a search engine application from 
> Lucene 3.5.0 to version 4.10.3. There have been some substantial API changes 
> in version 4 that break backward compatibility. I have managed to fix most of 
> them, but a few issues remain that I could use some help with:
> 
> 1.    "cannot override final method from Analyzer"
> 
> The original code extended the Analyzer class and the overrode 
> tokenStream(...). 
> 
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
>    CharStream charStream = CharReader.get(reader);        
>    return
>        new LowerCaseFilter(version,
>            new SeparationFilter(version,
>                new WhitespaceTokenizer(version,
>                    new HTMLStripFilter(charStream))));
> }
> 
> But this method is final now and I am not sure how to understand the 
> following note from the change log: 
> 
> "ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer 
> implementations must now use Analyzer.TokenStreamComponents, rather than 
> overriding .tokenStream() and .reusableTokenStream() (which are now final). "
> 
> There is another problem in the method quoted above: 
> 
> 2.    "The method get(Reader) is undefined for the type CharReader"
> 
> There seem to have been some considerable changes here, too. 
> 
> 3.    "TermPositionVector cannot be resolved to a type"
> 
> This class is gone now in Lucene 4. Are there any simple fixes for this? From 
> the change log: "The term vectors APIs (TermFreqVector, TermPositionVector, 
> TermVectorMapper) have been removed in favor of the above flexible indexing 
> APIs, presenting a single-document inverted index of the document from the 
> term vectors."
> 
> Probably related to this: 4. "The method getTermFreqVector(int, String) is 
> undefined for the type IndexReader."
> 
> Both problems occur here, for instance: 
> 
> TermPositionVector termVector = (TermPositionVector) 
> reader.getTermFreqVector(...);
> 
> ("reader" is of Type IndexReader)
> 
> I would appreciate any help with these issues. Thanks a lot in advance.
> 
> Cheers, 
> 
> Martin
> 
> 
> 
> PS: FYI, I have posted the same question on Stackoverflow: 
> http://stackoverflow.com/questions/27881296/upgrading-lucene-from-3-5-to-4-10-how-to-handle-java-api-changes?noredirect=1#comment44166161_27881296
>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes

Reply via email to