If you search "A B" (with quotes) it is correct, but if you search only A B
(without quotes) it is not correct, because, by default the query parser
creates "OR" queries.
So searching A B you will find all documents that contains A or B, while
searching only A or only B you will normally find le
Hi,
If I search for string "A B" (i.e. A followed by a space followed by B) and I
get 20 hits, then is it correct to expect that if I search for "A" (i.e. only
A), I will get at least 20 hit or more? Similarly for if I search for B, I will
get 20 hits or more?
--Hrishi
DISCLAIMER
==
T
Thanks all,
but how nutch handle this problem? am aware of nutch but not in
depth. If i search the keyword "about us" , nutch gives me exactly what i
want. Is there any scoring techinques? please let me know.
--
View this message in context:
http://www.nabble.com/Searc
(sorry, tangent. I'll be quick)
On Tue, Aug 4, 2009 at 8:42 AM, Shai Erera wrote:
> Interesting ... I don't have access to a Japanese dictionary, so I just
> extract bi-grams.
Shai - if you're interested in parsing Japanese, check out Kakasi. It
can split into words and convert Kanji->Katakana/Hi
may hurt recall severely.
Shai
On Tue, Aug 4, 2009 at 7:34 PM, N Hira wrote:
>
> Good summary, Shai.
>
> I've missed some of this thread as well, but does anyone know what happened
> to the suggestion about query manipulation?
>
> e.g., query (about us) => query("abo
t;creditcard")
Regards,
-h
- Original Message
From: Shai Erera
To: java-user@lucene.apache.org
Sent: Tuesday, August 4, 2009 10:31:46 AM
Subject: Re: Searching doubt
Hi Darren,
The question was, how given a string "aboutus" in a document, you can return
that document a
Well.. search on both anyhow.
"about us" OR "aboutus" should hit the spot I think.
Matt
Ian Lea wrote:
The question was, how given a string "aboutus" in a document, you can return
that document as a result to the query "about us" (note the space). So we're
mostly discussing how to detect and t
> The question was, how given a string "aboutus" in a document, you can return
> that document as a result to the query "about us" (note the space). So we're
> mostly discussing how to detect and then break the word "aboutus" to two
> words.
I haven't really been following this thread so apologies
Interesting ... I don't have access to a Japanese dictionary, so I just
extract bi-grams. But I guess that in this case, if one can access an
English dictionary (are you aware of an "open-source" one, or free one
BTW?), one can use the method you mention.
But still, doing this for every Token you
On Tue, Aug 4, 2009 at 8:31 AM, Shai Erera wrote:
> Hi Darren,
>
> The question was, how given a string "aboutus" in a document, you can return
> that document as a result to the query "about us" (note the space). So we're
> mostly discussing how to detect and then break the word "aboutus" to two
>
A, ok. Interesting problem there as well.
I'll think on that one some too!
cheers.
> Hi Darren,
>
> The question was, how given a string "aboutus" in a document, you can
> return
> that document as a result to the query "about us" (note the space). So
> we're
> mostly discussing how to detec
Hi Darren,
The question was, how given a string "aboutus" in a document, you can return
that document as a result to the query "about us" (note the space). So we're
mostly discussing how to detect and then break the word "aboutus" to two
words.
What you wrote though seems interesting as well, onl
Just catching this thread, but if I understand what is being asked I can
share how I do multi-word phrase matching. If that's not what's wanted,
pardons!
Ok, I load an entire dictionary into a lucene index, phrases and all.
When I'm scanning some text, I do lookups in this dictionary index using
On Tue, Aug 4, 2009 at 3:56 AM, Shai Erera wrote:
> 2) Use a dictionary (real dictionary), and search it for every substring,
> e.g. "a", "ab", "abo" ... "about" etc. If you find a match, split it there.
> This needs some fine tuning, like checking if the rest is also a word and if
> the full strin
;ll index it. Is there any technique to use while indexing
> ? am using lucene 2.4.0 version. Please suggest me.
> --
> View this message in context:
> http://www.nabble.com/Searching-doubt-tp24802552p24805609.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
ease suggest me.
--
View this message in context:
http://www.nabble.com/Searching-doubt-tp24802552p24805609.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
t; if i use this code for my search , it gives me the unwanted search hits ,
>
> as i mentioned , if i search for "about us" , this is an example ,
> there may be more number of urls like this , for example , "credit cards"
> ,
> "book marks" ,
is an example ,
there may be more number of urls like this , for example , "credit cards" ,
"book marks" , how do i handle it ?
--
View this message in context:
http://www.nabble.com/Searching-doubt-tp24802552p24803560.html
Sent from the Lucene - Java Users mailing list arch
e out
> of this.
>
> --
> View this message in context:
> http://www.nabble.com/Searching-doubt-tp24802552p24803073.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
>
(query);
but its returning 0 hits as results , where am i wrong? please help me out
of this.
--
View this message in context:
http://www.nabble.com/Searching-doubt-tp24802552p24803073.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
http://www./aboutus/.xyz/
> >http://www./aboutus/.....def/
> >
> > if i search "aboutus" , the results coming up correctly.
if i search "aboutus" , the results coming up correctly. Please
> any1 suggest me how to handle this situation.
> --
> View this message in context:
> http://www.nabble.com/Searching-doubt-tp24802552p24802552.ht
22 matches
Mail list logo