MoreLikeThis API changes?

2007-05-29 Thread Ryan McKinley
I'm trying to build a custom MoreLikeThis implementation that will run within solr and I've run into a few API hurdles... 1. Can MLT.java be modified to optionally take the Similarity implementation in the constructor? Currently it is hardcoded to: private Similarity similarity = new Default

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

2007-05-29 Thread Doron Cohen
Chris Hostetter <[EMAIL PROTECTED]> wrote on 29/05/2007 12:51:38: > > : I've found that trying to specify a near query using something like: > : actor_name_mv:"Foster, Jody"~2 > : matches "Foster, Jody" with a tf score of 1, but it matches "Jody > : Foster" with a tf score of .577 The phraseFreq i

RE: addding/searching documents during optimize

2007-05-29 Thread Mordo, Aviran (EXP N-NANNATEK)
1. Yes it is safe to search while optimizing and adding documents to an index. 2. NO you can not add documents to an index while it is optimized. You can only have one instance of IndexWriter working on an index HTH Aviran http://www.aviransplace.com http://shaveh.co.il -Original Message--

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

2007-05-29 Thread Chris Hostetter
: I've found that trying to specify a near query using something like: : actor_name_mv:"Foster, Jody"~2 : matches "Foster, Jody" with a tf score of 1, but it matches "Jody : Foster" with a tf score of .577 The phraseFreq in the first case is 1 : and the phraseFreq in the second is 1/3. as i reca

Re: (Sort of) transactional behavior

2007-05-29 Thread Michael McCandless
"Carlos Pita" <[EMAIL PROTECTED]> wrote: > I have a searcher and a writer, the writer writes N changes, then the > searcher is reopened to reflect them. Depending on whether autoCommit is > false or true for the writer it could have to be closed after the N-changes > batch too, just to make visibl

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Steven Rowe
Hi Michael, Michael Böckling wrote: > Hi folks! > > The topic says it all: I want to modify the StandardAnalyzer so that it also > splits words after punctuation characters (.,: etc.) that are NOT followed > by a whitespace character, in addition to punctuation characters that ARE > followed by w

Re: synchronize hits variable?

2007-05-29 Thread Erick Erickson
Except pay heed to the documentation that says something about calling IndexReader.doc() inside the loop is expensive. Although lazy loading is, I think, designed to help with this Erick On 5/29/07, John Powers <[EMAIL PROTECTED]> wrote: If I need all the documents returned by the query to

Fw: About Lucene-patch-446

2007-05-29 Thread Doron Cohen
- taking theis discussion back to the user list - "Huajing Li" wrote on 29/05/2007: > Hi Doron, > > Days ago I published a post in the Lucene user maillist asking > about merging database data with Lucene que

Re: Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Erick Erickson
Well, one possibility is to do something simpler. Rather than modifying StandardAnalyzer, modify the input stream. That is, substitute spaces for punctuation NOT followed by whitespace and then just let the analyzer handle the result. For that matter, if you're going to alter the input stream bef

(Sort of) transactional behavior

2007-05-29 Thread Carlos Pita
Hi all, I have a searcher and a writer, the writer writes N changes, then the searcher is reopened to reflect them. Depending on whether autoCommit is false or true for the writer it could have to be closed after the N-changes batch too, just to make visible the flushed changes. But suppose for n

Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-29 Thread Steven Rowe
Hi Mohammad, Mohammad Norouzi wrote: > [Hoss wrote:] >> ...are there Persian characters with a category type of SPACE_SEPARATOR, >> LINE_SEPARATOR, or PARAGRAPH_SEPARATOR ? > > How can I know that? The Unicode standard's codes[1] for these are: SPACE SEPARATOR: Zs LINE SEPARATOR: Zl PA

RE: synchronize hits variable?

2007-05-29 Thread John Powers
If I need all the documents returned by the query to to the Hits object, does a hitcollector work? I take it that each time I Hits.doc(i) I get the full document; so if that's the problem, can I just get all of the single column I need for all those docs? Size: the directory I'm dealing with

Modifying StandardAnalyzer so that it also splits words after pun ctuation characters that are not followed by whitespace

2007-05-29 Thread Michael Böckling
Hi folks! The topic says it all: I want to modify the StandardAnalyzer so that it also splits words after punctuation characters (.,: etc.) that are NOT followed by a whitespace character, in addition to punctuation characters that ARE followed by whitespace. Of course i've looked at StandardToke

Is it possible to do near terms without using phrase slop in query parser syntax?

2007-05-29 Thread Daniel Einspanjer
I've got a field that is indexing people names. The field is multivalued and I'm using Solr with a positionIncrementGap of 100. I've found that trying to specify a near query using something like: actor_name_mv:"Foster, Jody"~2 matches "Foster, Jody" with a tf score of 1, but it matches "Jody Fo

addding/searching documents during optimize

2007-05-29 Thread Joe
Hi, I am not sure, so i need ur opinion to these 2 questions: Is it save to search an index while its beeing optimized by another java process? Is it save to add documents to an index while its beeing optimized by another java process?

Re: maxDoc and arrays

2007-05-29 Thread Erick Erickson
As far as I know, no changes are visible to an already-opened reader so for the life of that reader document IDs are unchanged. Erick On 5/28/07, Carlos Pita <[EMAIL PROTECTED]> wrote: Hi again, On 5/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Currently, a deleted doc is removed when th

Shortest snippet in search results

2007-05-29 Thread Prasanna Seshadri
Hello users, I am right now developing an algorithm to calculate the shortest snippet from the search results for a given keyword of length n (from user query). From the lucene source I found that there is a method getBestFragments which would do the same. However its very hard to interpret

Re: Does Lucene search over memory too?

2007-05-29 Thread Michael McCandless
"Antony Bowesman" <[EMAIL PROTECTED]> wrote: > Doron Cohen wrote: > > Antony Bowesman <[EMAIL PROTECTED]> wrote on 28/05/2007 22:48:41: > > > >> I read the new IndexWriter Javadoc and I'm unclear about this > >> autocommit. In > >> 2.1, I thought an IndexReader opened in an IndexSearcher does not

Re: Does Lucene search over memory too?

2007-05-29 Thread Antony Bowesman
Doron Cohen wrote: Antony Bowesman <[EMAIL PROTECTED]> wrote on 28/05/2007 22:48:41: I read the new IndexWriter Javadoc and I'm unclear about this autocommit. In 2.1, I thought an IndexReader opened in an IndexSearcher does not "see" additions to an index made by an IndexWriter, i.e. maxDoc an

Re: Does Lucene search over memory too?

2007-05-29 Thread Michael McCandless
"SK R" <[EMAIL PROTECTED]> wrote: > Hi Michael McCandless, > Thanks a lot for this clarification. > Calling writer.flush() before every search is the solution for my > case. > But this may cause any performance issues(i.e) more time or more memory > requirement? > Any idea about ti

MultiSearcher, Hits and createWeight

2007-05-29 Thread Israel Tsadok
Hi, I am developing a distributed index, using MultiSearcher and RemoteSearcher. When investigating some performance issues, I noticed that there is a lot of back-and-forth traffic between the servers during the weight calculation. Although MultiSearcher has a method called createWeight that minim

Re: Does Lucene search over memory too?

2007-05-29 Thread Michael McCandless
"Doron Cohen" <[EMAIL PROTECTED]> wrote: > Antony Bowesman <[EMAIL PROTECTED]> wrote on 28/05/2007 22:48:41: > > > I read the new IndexWriter Javadoc and I'm unclear about this > > autocommit. In > > 2.1, I thought an IndexReader opened in an IndexSearcher does not "see" > > additions to an index

Re: Does Lucene search over memory too?

2007-05-29 Thread SK R
Hi Michael McCandless, Thanks a lot for this clarification. Calling writer.flush() before every search is the solution for my case. But this may cause any performance issues(i.e) more time or more memory requirement? Any idea about time taken for writer.flush()? Thanks & Regards RSK On