Hi Uwe, Thanks a lot for the detailed reply. I'll see how far I get with it, but being quite new to Lucene, it seems I am lacking a bit of background information to fully understand the response below. In particular, I need to do some background reading on how token streams and readers work, I guess.
Cheers, Martin Am 11.01.2015 um 11:05 schrieb Uwe Schindler <u...@thetaphi.de>: > Hi, > > > > First, there is also a migrate guide next to the changes log: > http://lucene.apache.org/core/4_10_3/MIGRATE.html > > > > 1. If you implement analyzer, you have to override createComponents() which > return TokenStreamComponents objects. See other Analyzer’s source code to > understand how to use it. One simple example is in the Javadocs: > http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/Analyzer.html > > > > 2. Use initReader() to wrap filters around readers. This class is protected > and can be overridden. CharFilter implements Reader, so you can wrap any > CharFilter there. Your HTMLStripCharsFilter have to wrapped around the given > reader here. > > > > 3./4. Term vectors are different in Lucene 4. Basically term vectors are a > small index for each document. And this is how its implemented. You get back > a Fields/Terms instances, which are basically like AtomicReader’s backend – > you can even execute a Query on the vectors: > > IndexReader#getTermVector() returns Terms for a specific field: > > <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVector(int,%20java.lang.String)> > > For all Fields (harder to use, unwrapping for a specific field is done above > – this one is more to execute Querys and so on): > > <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVectors(int)> > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > <http://www.thetaphi.de/> http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > From: Martin Wunderlich [mailto:martin.wunderl...@gmx.net] > Sent: Sunday, January 11, 2015 9:18 AM > To: java-user@lucene.apache.org > Subject: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes > > > > Hi all, > > > > I am currently in the process of upgrading a search engine application from > Lucene 3.5.0 to version 4.10.3. There have been some substantial API changes > in version 4 that break backward compatibility. I have managed to fix most of > them, but a few issues remain that I could use some help with: > > 1. "cannot override final method from Analyzer" > > The original code extended the Analyzer class and the overrode > tokenStream(...). > > @Override > public TokenStream tokenStream(String fieldName, Reader reader) { > CharStream charStream = CharReader.get(reader); > return > new LowerCaseFilter(version, > new SeparationFilter(version, > new WhitespaceTokenizer(version, > new HTMLStripFilter(charStream)))); > } > > But this method is final now and I am not sure how to understand the > following note from the change log: > > "ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer > implementations must now use Analyzer.TokenStreamComponents, rather than > overriding .tokenStream() and .reusableTokenStream() (which are now final). " > > There is another problem in the method quoted above: > > 2. "The method get(Reader) is undefined for the type CharReader" > > There seem to have been some considerable changes here, too. > > 3. "TermPositionVector cannot be resolved to a type" > > This class is gone now in Lucene 4. Are there any simple fixes for this? From > the change log: "The term vectors APIs (TermFreqVector, TermPositionVector, > TermVectorMapper) have been removed in favor of the above flexible indexing > APIs, presenting a single-document inverted index of the document from the > term vectors." > > Probably related to this: 4. "The method getTermFreqVector(int, String) is > undefined for the type IndexReader." > > Both problems occur here, for instance: > > TermPositionVector termVector = (TermPositionVector) > reader.getTermFreqVector(...); > > ("reader" is of Type IndexReader) > > I would appreciate any help with these issues. Thanks a lot in advance. > > Cheers, > > Martin > > > > PS: FYI, I have posted the same question on Stackoverflow: > http://stackoverflow.com/questions/27881296/upgrading-lucene-from-3-5-to-4-10-how-to-handle-java-api-changes?noredirect=1#comment44166161_27881296 >
signature.asc
Description: Message signed with OpenPGP using GPGMail