Re: FuzzyQuery

baris . kazar Wed, 12 Jun 2019 08:24:41 -0700

Tomoko,-

Thank You for Your suggestions. i am trying to understand it and ithought i did :)

but it does not work with FuzzyQuery when i used with a *single* largeTextField like street=...value... city=...value... region=...value...country=...value... (with or without quotes for the values)

What i knew about Lucene fuzzy queries are not holding now with thisTextfield form. That is why i suspected of a bug.


1. Yes, i saw and have a solid proof on that now.

2. yes but FuzzyQuery takes quotes as they are as they are escaped andit is not analyzed.

Stuffing into one textfield vs having separate fields should only affectprobably the performance but not the outcome in my case.But, i have been thinking about this and maybe it is the way to go inthis case.

mY CONTENT field has street names in mixed case and city, region countrynames in UPPERCASE. Can this be a problem?

i thought index stored them in lowercase since i am using StandardAnalyzer.

CONTENT field also has full textfield string with street=... city=...region=... country=... (here all values are UPPERCASE).

Why cant the index find the names via FuzzyQuery? i tried bothFuzzyQuery and Query builder as i showed before.

The last advice in Your previous email would nicely go outside theparantheses since it might be very critical :) :) :)


Best regards


On 6/12/19 12:17 AM, Tomoko Uchida wrote:

I'd suggest to correctly understand the way a software works before
suspecting its bug :-)

I guess you may miss two points:

1. the standard analyzer (standard tokenizer) breaks words by double
quote (U+0022) so quotes are not indexed or searched at all if you are
using standard analyzer. (That is the reason you have same results
with or without quotes.)
See: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
and 
https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=

2. double quote has special meaning (it's interpreted as phrase query)
with the built-in query parser so you need to escape it if you want to
search double quotes itself.
See: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=

(My advice would be to create separate fields for each key value pairs
instead of stuffing all pairs into one text field, if you need to
search them separately.)

2019年6月12日(水) 2:39 <[email protected]>:

i can say that quotes is not the issue with index as it still results in
same results with quotes or without quotes.

i am starting to feel that this might be a bug maybe??

Best regards


On 6/10/19 2:46 PM, [email protected] wrote:

Somehow " is causing an issue as this should return street with MAIN:

[contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
+contentDFLT:"region new-hampshire", +contentDFLT:"country united
states"] -> this was with fuzzyquery on MAINS

Best regards


On 6/10/19 2:24 PM, [email protected] wrote:

[+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
+contentDFLT:"country united states", contentDFLT:street
contentDFLT:mains]

QueeryParser chops it into two pieces from
parser.parser("street=\"MAINS\"");

Index has a TextField named contentDFLT the following data :
street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
HAMPSHIRE" country="UNITED STATES"


When i set street=\"MAINS~\" with parser:
i get the following
[+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
+contentDFLT:"country united states", contentDFLT:street
contentDFLT:mains]

probably " quotations are messing this up as You were saying...
Best regards


On 6/10/19 12:48 PM, Tomoko Uchida wrote:

Or, " (double quotation) in your query string may affect query parsing.

When I parse this string by classic query parser (lucene 8.1),
street="MAINS~"
parsed (raw) query is
text:street text:mains
(I set the default search field to "text", so text:xxxx is appeared
here.)

Query parsing is a complex process, so it would be good to check
parsed raw query string especially when you have (reserved) special
characters in your query...

2019年6月11日(火) 1:10 Tomoko Uchida <[email protected]>:

Hi,

I noticed one small thing in your previous mail.

when i use q1 = parser.parse("street=\"MAIN\""); i get same results

which is good.

To specify a search field, ":" (colon) should be used instead of "=".
See the query parser documentation:
https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=


I'm not sure this is related to your problem.

2019年6月11日(火) 0:51 <[email protected]>:

booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"city=\"NASHUA\""), BooleanClause.Occur.MUST);
booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
"country=\"UNITED STATES\""), BooleanClause.Occur.MUST);

org.apache.lucene.queryparser.classic.QueryParser parser = new
org.apache.lucene.queryparser.classic.QueryParser(field,
phraseAnalyzer) ;
           Query q1 = null;
           try {
               q1 = parser.parse("MAIN");
           } catch (ParseException e) {

               e.printStackTrace();
           }
           booleanQuery.add(q1, BooleanClause.Occur.SHOULD);

testQuerySearch2 Time to compute: 0 seconds
Number of results: 1775
Name: Main St
Score: 37.20959
ID: 12681979
Country Code: US
Coordinates: 42.76416, -71.46681
Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES"

Name: Main St
Score: 37.20959
ID: 12681977
Country Code: US
Coordinates: 42.747, -71.45957
Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES"

Name: Main St
Score: 37.20959
ID: 12681978
Country Code: US
Coordinates: 42.73492, -71.44951
Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES"

    when i use q1 = parser.parse("street=\"MAIN\""); i get same
results
which is good.

But when i switch to MAINS~ then fuzzy query does not work.


i need to say something with the q1 only in the booleanquery:
it tries to match the MAIN in street, city, region and country
which are
in a single TextField field.
But i dont want this. that is why i need to street="..." etc when
searching.

Best regards



On 6/10/19 11:31 AM, Tomoko Uchida wrote:

Hi,

just for the basic verification, can you find the document without
fuzzy query? I mean, does this query work for you?

Query query = parser.parse("MAIN");

Tomoko

2019年6月11日(火) 0:22 <[email protected]>:

why cant the second set not work at all?

it is indexed as Textfield like street="..." city="..." etc.

Best regards



On 6/10/19 11:23 AM, [email protected] wrote:

i dont know how to use Fuzzyquery with queryparser but probably
You
are suggesting

QueryParser parser = new QueryParser(field, analyzer) ;
Query query = parser.parse("MAINS~2");

booleanQuery.add(query, BooleanClause.Occur.SHOULD);

am i right?
Best regards


On 6/10/19 10:47 AM, Atri Sharma wrote:

I would suggest using a QueryParser for your fuzzy query before
adding it to the Boolean query. This should weed out any case
issues.

On Mon, 10 Jun 2019 at 8:06 PM, <[email protected]
<mailto:[email protected]>> wrote:

       BooleanQuery.Builder booleanQuery = new
BooleanQuery.Builder();

       //First set

               booleanQuery.add(new FuzzyQuery(new
       org.apache.lucene.index.Term(field, "MAINS")),
       BooleanClause.Occur.SHOULD);
booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
       "NASHUA"), BooleanClause.Occur.MUST);
booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
       "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
       "UNITED STATES"), BooleanClause.Occur.MUST);

       // Second set
                //booleanQuery.add(new FuzzyQuery(new
       org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
       BooleanClause.Occur.SHOULD);
//booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,

       field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
//booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,

       field, "region=\"NEW HAMPSHIRE\""),
BooleanClause.Occur.MUST);
//booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,

       field, "country=\"UNITED STATES\""),
BooleanClause.Occur.MUST);

       The first set brings also street with Nashua name.
(NASHUA).

       so, to prevent that and since i also indexed with
street="..."
       city="..." i did the second set but it does not bring
anything.

       createPhraseQuery builds a Phrasequery with one term
equal to the
       string
       in the call.

       Best regards



       On 6/10/19 10:47 AM, [email protected]
       <mailto:[email protected]> wrote:
       > How do i check how it is indexed? lowecase or uppercase?
       >
       > only way is now to by testing.
       >
       > i am using standardanalyzer.
       >
       > Best regards
       >
       >
       > On 6/9/19 11:57 AM, Atri Sharma wrote:
       >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
       >> <[email protected]
<mailto:[email protected]>> wrote:
       >>> Hi,
       >>>
       >>> What analyzer do you use for the text field? Is the
term "Main"
       >>> correctly indexed?
       >> Agreed. Also, it would be good if you could post your
actual
code.
       >>
       >> What analyzer are you using? If you are using
StandardAnalyzer,
       then
       >> all of your terms while indexing will be lowercased,
AFAIK, but
       your
       >> query will not be analyzed until you run a
QueryParser on it.
       >>
       >>
       >> Atri
       >>
       >
       >
       >
---------------------------------------------------------------------

       > To unsubscribe, e-mail:
[email protected]
<mailto:[email protected]>
       > For additional commands, e-mail:
       [email protected]
<mailto:[email protected]>
       >

---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: FuzzyQuery

Reply via email to