Hi Mikhail

Yes, if my text contains "wifi router", and my synonym map includes "wifi 
router","internet device", then if I search for "wifi device" I will get a 
match. While I can see that on the strictest criteria this might be incorrect, 
in practice I would happily see that returned as a match. I wouldn't call it a 
false positive, it's more like an unintended benefit.

No doubt there are pathological cases where I would not be so happy but nobody 
has come up with one in our application yet. As I said there's scope for 
improvement in our implementation, but at this point I'm not convinced that the 
benefit of plugging this gap justifies the cost.

If somebody points you to a better option I would also be interested in seeing 
it.

cheers
T

-----Original Message-----
From: Mikhail Khludnev <m...@apache.org> 
Sent: Tuesday, 3 January 2023 09:55
To: java-user@lucene.apache.org
Subject: Re: Question for SynonymQuery

Hello Trevor.
Can you help me better understand this approach? If we have a text "wifi 
router" and inject "internet device" at indexing time, terms reside at the same 
positions. How to avoid false positive match for query "wifi device"?

On Mon, Jan 2, 2023 at 4:16 PM Trevor Nicholls <tre...@castingthevoid.com>
wrote:

> Hi Anh
>
> The two links Michael shared relate to questions I asked when I was 
> trying to get synonym matching with our application.
>
> I really do have multi-term synonym matching working at this point; 
> there's always scope for improvement of course but with the hints 
> suppled in those threads I was able to index our documents and search 
> them using a variety of synonymous terms, both single words and phrases.
>
> Our application does not use either BooleanQuery or SynonymQuery; I 
> have just used the standard QueryParser. Instead the synonym 
> processing occurs in the indexing phase, which is not only simpler 
> (one search pattern, one query), but also I think you would also find 
> it gives you superior performance (because the synonym processing 
> occurs once at indexing time and not at all during searching - and I'm 
> sure you'll be doing far more searching than indexing).
>
> cheers
> T
>
>
> -----Original Message-----
> From: Michael Wechner <michael.wech...@wyona.com>
> Sent: Thursday, 29 December 2022 08:56
> To: java-user@lucene.apache.org
> Subject: Re: Question for SynonymQuery
>
> Hi Anh
>
> The following Stackoverflow link might help
>
>
> https://stackoverflow.com/questions/73240494/can-someone-assist-me-wit
> h-a-multi-word-synonym-problem-in-lucene
>
> The following thread seems to confirm, that escaping the space with a 
> backslash does not help
>
> https://lists.apache.org/list?java-user@lucene.apache.org:2022-3
>
> HTH
>
> Michael
>
>
> Am 27.12.22 um 20:22 schrieb Anh Dũng Bùi:
> > Hi Lucene users,
> >
> > I recently came across SynonymQuery and found out that it only 
> > supports single-term synonyms (since it accepts a list of Term which 
> > will be considered as synonyms). We have some multi-term synonyms 
> > like "internet device" <-> "wifi router" or "dns" <-> "domain name 
> > service". Am I right that I need to use something like a 
> > BooleanQuery
> for these cases?
> >
> > I have 2 other follow-up questions:
> > - Does SynonymQuery have any advantage over BooleanQuery? Or is it 
> > only different in how scores are computed? As I understand 
> > SynonymWeight will consider all terms as exactly the same while 
> > BooleanQuery will favor the documents with more matched terms.
> > - Is it worth it to support multi-term synonyms in SynonymQuery? My 
> > feeling is that it's better to just use BooleanQuery in those cases, 
> > since to support multi-term synonyms it needs to accept a list of 
> > Query, which would make it behave like a BooleanQuery. Also how 
> > scoring works with multi-term is another problem.
> >
> > Thanks & Regards!
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to