Hi,
Thanks for your reply Paul.
Yes, this was a delicate point.
I gave up indexing multi-words synonyms as single token for the
reason you pointed.
To handle phraseQueries, i change the positions of the Terms that
follows the synonyms.
For instance for the PhraseQuery "jsp and vb develope
I knew there was a catch...
I do think, however, that the point is a delicate one which would
consideration: multi-word synonyms are quite common!
paul
Le 29 avr. 05, à 18:47, Paul Smith a écrit :
Indexing every multi-word synonym as a single token would introduce
spaces into the tokens. In that
Indexing every multi-word synonym as a single token would introduce
spaces into the tokens. In that case searching for (java) would not
match "i love jsp and tomcat". I think that searching for (java*) would
match.
Rewriting the query is also problematic. If you search for (java
se
Hello,
What about the solution to index every multi-word synonym as a single
token ?
Example :
Phrase to index : "i love jsp and tomcat"
Synonyms: "jsp" = "java server pages" = "javaserver pages"
Tokens : i love jsp and tom
If I understand well... it would be easy to do so if you do not wish to
use phrase matches... you could just add a field (with the same name)
for each token...
I think that, if you wish phrase-matches (or the span-ones) then Lucene
can't help you... but I'm quite a newbie on this topic.
Is the
I have found the previous discussions on multi word synonyms as as well as
the section on synonym injection in Hatcher's book, but have not been able
to come up with a satisfactory solution. I am indexing text that has several
multi word synonyms. Some of the synonyms may have single words as on
What drawbacks are there from replacing multiple words with its
corresponding acryonym/alias during analysis?
- Wildcard search: [cyber] [ca*] would not match [cybercafe]
- Fuzzy search: [cyber] [cage~] would not match [cybercafe]
Peter
_
m.
but how would you set the position increment of a multi-word synonym so
that phrase/span queries will work?
Assuming you have the following "phrase synonym" (and code that
that can find them during Analysis)...
[CyberCafe] => [Cyber] [Cafe]
[IBM] => [International] [B
words to "0" (but that will still reseult in false positives in
: the "cyber cafe" example) or to pick some high default position incriment
: (bigger then the longest multi-word synonym) and use that normally, and
: reserve incriments of "1" for words in a multi-word sy
: You'll need some kind of lookup to know how to split a token like
: "cybercafe" into two words - once you've done that it will be easy to
: set the position increment of them to zero so that they overlay the
: original term.
but how would you set the position increment of
Hi,
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> > My problem is, however, that some words needs to have alternatives
> > where the word is decomposed / decompounded into two or more words:
> >
> > "FooBar Corp" or "cybercafe"
> >
> > should be found when searching for
> >
> > "Foo Ba*" or
On Apr 11, 2005, at 9:36 AM, Peter Hotm. Nørregaard wrote:
According to "Lucene in Action" it is possible to get synonyms indexed
together with a word by putting multiple words with the same
position-id in the term vector.
My problem is, however, that some words needs to have alternatives
where
According to "Lucene in Action" it is possible to get synonyms indexed
together with a word by putting multiple words with the same position-id in
the term vector.
My problem is, however, that some words needs to have alternatives where the
word is decomposed / decompounded into two or more wor
13 matches
Mail list logo