Most frequently indexed term

2009-05-25 Thread Ganesh
Hello All, I need to build some stats. I need to know Top 5 frequently indexed term in a date range (In a day or a Month). Any idea of how to achieve this. Regards GaneshIé݊{-j{fzˁë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§êòj(r‰

Re: relevance function for scores

2009-05-25 Thread kenny kim
Hi, I think you and I are looking for the same thing. I believe that it can dramatically reduce search time for my heavy indexes. Could you let me know if you find a good solution? Hope, have a good day. On 2009. 05. 18, at 오후 9:52, Joel Halbert wrote: Hi, I'd like to apply a score filter

Re: New user in lucene

2009-05-25 Thread Alexander Aristov
>From my point of view the best option for you would be using Solr. You can integrate it with any web app/html page Alexander 2009/5/26 StanleyTan > > hi all, > > i'm new to lucene. will like to ask, am i able to integrate lucene search > into my normal html pages, not web applications? meanin

New user in lucene

2009-05-25 Thread StanleyTan
hi all, i'm new to lucene. will like to ask, am i able to integrate lucene search into my normal html pages, not web applications? meaning just viewing of html pages and searching for them using lucene. using javascript or java applet. if possible, how am i able to do that? help will be very

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread Hasan Diwan
2009/5/24 KK : > There is one more mail I found in the archive[3/4 days old] where someone > asked about extracting 3 neighbors words around the match. I think once you > have the position of matching term/phrase then extracting 3 or 30 neighbors > wont be different, right? because you just have to

Re: relevance function for scores

2009-05-25 Thread Babak Farhang
Woops. Got that backwards.. should read > if (score[n] / score[n-1]) < c / (boost_factor) On Mon, May 25, 2009 at 4:10 PM, Babak Farhang wrote: > How about determining the cutoff by measuring the percentage > difference between successive scores: if the score drops by a > threshold amount the

Re: relevance function for scores

2009-05-25 Thread Babak Farhang
How about determining the cutoff by measuring the percentage difference between successive scores: if the score drops by a threshold amount then you've hit the cutoff. In the example you mention, you might want to try something like c/1000, where 1 < c < 25 is a constant (experiment to find a swee

Re: Hit highlighting for non-english unicode index/queries not working?

2009-05-25 Thread Robert Muir
as mentioned previously, i dont think your text is being analyzed the way you want. SimpleAnalyzer will break your word \u0BAA\u0BB0\u0BBF\u0BA3\u0BBE\u0BAE (பரிணாம) into 3 tokens: \u0BAA\u0BB0 \u0BA3 \u0BAE Not only does it incorrectly split your word into three words, but it completely drops t

Re: Hit highlighting for non-english unicode index/queries not working?

2009-05-25 Thread Michael McCandless
Could you boil down this example to a smaller test case that fails? Eg make a RAMDir, index one document (that should show hilighting), search it, run highlight and show that it's not working? Mike On Mon, May 25, 2009 at 10:02 AM, KK wrote: > Hi, > I'm trying to index some non-english texts. I

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
Thank you very much @Michael. I googled and didn't find much but I grabbed the book LIA 2nd Edn and went through that and found a very good example in Sec8.7 and that helped me solve the problem. Now I'm able to do highlighting for english texts but for non-english text no luck yet. I've posted new

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread Michael McCandless
I would do some googling to find examples, or read the javadocs for the highlighter package? Or pick up copy of the early-release of Lucene in Action 2nd edition from http://manning.com [disclaimer: I'm one of the authors on that book!]. We've revamped the highlighter coverage (in chapter 8)...

Hit highlighting for non-english unicode index/queries not working?

2009-05-25 Thread KK
Hi, I'm trying to index some non-english texts. Indexing and searching is working fine. From command line I'm able to provide the utf-8 unicoded text as input like this, \u0BAA\u0BB0\u0BBF\u0BA3\u0BBE\u0BAE and able to get the search results. Then I tried to add hit highlighting for the same. So I

Incorrect search result with PhraseQuery

2009-05-25 Thread ac
hi, has anyone stumble across this problem where PhraseQuery leads to incorrect results? In my specific case PhraseQuery would become equivilent to a set of disjunctive term queries. However, upon restarting my application (inside tomcat) PhraseQuery would work again. The logic that produces query

Most frequently indexed term

2009-05-25 Thread Ganesh
Hello All, I need to build some stats. I need to know Top 5 frequently indexed term in a date range (In a day or a Month). Any idea of how to achieve this. Regards GaneshIé݊{-j{fzˁë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§êòj(r‰

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
Thanks @Michael. I've no idea about this contrib though I'm looking into highlighter. Can you throw some lights on the same. The steps to be taken for achieving the same. I'm completely new to this thing. Can you point me to some examples for the same? Thank you. KK. On Mon, May 25, 2009 at 3:26

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread Michael McCandless
Can't you use contrib/highlighter to achieve this? It can do both excerpting (grabbing chunk of text around each hit) and highlighting (highlighting the specific tokens that matched, within that excerpt). Mike On Mon, May 25, 2009 at 5:20 AM, KK wrote: > Thanks for your response @Seid. > Can an

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
Thanks for your response @Seid. Can any Lucene user give me directions on this regard? I'm stuck. Really appreciate your help. Thanks, KK On Mon, May 25, 2009 at 2:43 PM, Seid Muhie wrote: > actually I used the normal java standard libraries for this work. I > used lucene only to retrieve the

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread Seid Muhie
actually I used the normal java standard libraries for this work. I used lucene only to retrieve the relevant document. what you will do is, thought it is to manuall, as i don't know the way it can be done by the Lucene API, you just record the location of the query terms in the document (it is as

Re: how to get the word before and the word after the matched Term?

2009-05-25 Thread KK
One more information I would like to add, # I'm building index mostly for non-english texts/documents. and searching is done using unicode utf-8 texts[its obivious, right?] Thanks KK On Mon, May 25, 2009 at 10:58 AM, KK wrote: > Hi All. > I want to do the same thing with say a window of 10/15.