Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
Waiting for an explanation for my query. Thank you very much. On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy wrote: > Hello, > > Can anyone help me understand the scoring function in the > LMJelinekMercerSimilarity class? > > The scoring function in LMJelinekMercerSimilarity is shown below: > -

Re: Email id tokenizer (actual email id & multiple terms)

2016-12-20 Thread Trejkaz
On Wed, Dec 21, 2016 at 1:21 AM, Ahmet Arslan wrote: > Hi, > > You can index whole address in a separate field. > Otherwise, how would you handle positions of the split tokens? > > By the way, speed of phrase search may be just fine, so consider trying first. Speed aside, phrase search is difficu

Re: Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Will Martin
https://doi.org/10.3115/981574.981579 On 12/20/2016 12:21 PM, Dwaipayan Roy wrote: Hello, Can anyone help me understand the scoring function in the LMJelinekMercerSimilarity class? The scoring function in LMJelinekMercerSimilarity is shown below: -

Re: ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Mikhail Khludnev
It probably deserves a jira, although it's minor. On Tue, Dec 20, 2016 at 6:18 PM, Otmar Caduff wrote: > Thanks for your response, Ahmet! > > I agree, a meaningful phrase query should have at least two terms. However, > why should the query "john" (without wildcard) then work? > > I'm trying to

Explain Scoring function in LMJelinekMercerSimilarity Class

2016-12-20 Thread Dwaipayan Roy
Hello, Can anyone help me understand the scoring function in the LMJelinekMercerSimilarity class? The scoring function in LMJelinekMercerSimilarity is shown below: float score = stats.getTotalBoost() * (float)Math.log(1 + ((1 - lambda) * fr

Re: ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Otmar Caduff
Thanks for your response, Ahmet! I agree, a meaningful phrase query should have at least two terms. However, why should the query "john" (without wildcard) then work? I'm trying to figure out if I can use ComplexPhraseQueryParser as a default in my application or if I have to handle some cases di

Re: Email id tokenizer (actual email id & multiple terms)

2016-12-20 Thread Ahmet Arslan
Hi, You can index whole address in a separate field. Otherwise, how would you handle positions of the split tokens? By the way, speed of phrase search may be just fine, so consider trying first. Ahmet On Tuesday, December 20, 2016 5:15 PM, suriya prakash wrote: Hi, I am using standard anal

Re: ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Ahmet Arslan
Hi Otmar, A single term inside quotes is meaningless. A phrase query should have at least two terms in it, shouldn't it? What is your intention with a such "john*" query? Ahmet On Tuesday, December 20, 2016 4:56 PM, Otmar Caduff wrote: Hi, I have an index with a single document with a fi

Email id tokenizer (actual email id & multiple terms)

2016-12-20 Thread suriya prakash
Hi, I am using standard analyzer and want to split token for email_id " luc...@gmail.com" as "lucene", "gmail","com","luc...@gmail.com" in a single pass. I have already changed jflex to split email id as separate words(lucene, gmail, com). But we need to do phrase search which will not be efficie

ComplexPhraseQueryParser with wildcards

2016-12-20 Thread Otmar Caduff
Hi, I have an index with a single document with a field "field" and textual content "johnny peters" and I am using org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to parse the query: field: (john* peter) When searching with this query, I am getting the document as expected.