Am 18.11.2016 um 08:58 schrieb Bernd Fehling: > Hi Mike, > > let me explain. > > First, after looking deeper inside I noticed that the Filters are used > like a stack and called backwards. So the first incrementToken goes > to the last filter in the chain. That one also uses incrementToken and > and calls its predecessor in the chain and so on. > So everything following after SynonymFilter in the chain only gets its > "knowledge" from the token and its attributes. As result of this, there > is no sense of a "hasSynonyms" function in SynonymFilter. The only > solution would be another token attribute and my first assumtion was wrong. > > Second, was has changed between 4.10.4 and 6.3.0? > In 4.10.4 SynonymFilter "produced" SYNONYMS which also contained the original > Token and the first synonym in line had positionIncrement set. > synonym.txt: bar, foo, foo\ bar, baz that was a typo, correct is: synonym.txt: foo, foo\ bar, baz > IN: foo(shingle)posInc=1 > OUT: foo(shingle)posInc=1, foo(SYNONYM)posInc=1 "foo bar"(SYNONYM)posInc=0, > baz(SYNONYM)posInc=0 > > In 6.3.0 the output is different. > IN: foo(shingle)posInc=1 > OUT: foo(shingle)posInc=1, "foo bar"(SYNONYM)posInc=0, baz(SYNONYM)posInc=0 > > In 4.10.4 we just dropped the shingles and everything was fine. > The positionIncrement was correct and the ingoing shingle which generated the > SYNONYMs > was also included as SYNONYM, because it can also be named a SYNONYM as it is > equal > to all other synonyms in synonym.txt. > > Now in 6.3.0 this is quite difficult and not as easy as it was. > - I can't drop all shingles. > - Because of this kinf of stack calling of the filters I can't predict if a > shingle produced SYNONYMS. > > Either I have a token attribute which tells me that the shingle coming out of > SynonymFilter has produced SYNONYMS (and I should not drop it because it is > not in SYNONYM result anymore), > Or I have to use caching, wait until incrementToken returns false and then > parse through all results and clean up. > > Because of this backwards calling (stack building) of filters I would > suggest another token attribute which tells me if something going into > SynonymFilter has produced SYNONYMs, which will follow next. > > What do you think, any other idea? > As I mentioned, this is for a special solution and probably not very common. > > Regards > Bernd > > > Am 17.11.2016 um 22:17 schrieb Michael McCandless: >> Hmm are you saying SynonymFilter in 4.10.4 has this capability but >> 6.3.0 lost it? >> >> So you you have a synonym "wow that's funny" -> "wtf", you want the >> token for "wow" to state that it has a synonym? >> >> Using the PositionLengthAttribute you should be able to reconstruct >> this, because when you see "wtf' with position length 3, you know it >> spanned "wow", "that's", "funny". >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Nov 17, 2016 at 10:22 AM, Bernd Fehling >> <bernd.fehl...@uni-bielefeld.de> wrote: >>> Currently I'm tackling a problem with SynonymFilter while going from 4.10.4 >>> to 6.3.0. >>> >>> For a special solution I need to know if a word (or multiword) is producing >>> synonyms in SynonymFilter. >>> >>> Therefore I suggest the enhancement of "hasSynonyms" for SynonymFilter. >>> >>> A workaroud would be to buffer all results from SynonymFilter and check if >>> after a word or multiword (of any type) is the next one a SYNONYM. >>> >>> A function "hasSynonyms" in SynonymFilter would make things easy :-) >>> >>> What do you think about this? >>> >>> Regards >>> Bernd >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
-- ************************************************************* Bernd Fehling Bielefeld University Library Dipl.-Inform. (FH) LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net ************************************************************* --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org