Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Hey Mike, My only concern is that I am replacing a large number of fields inside of a Document with a (very large ~50e6) number of Documents. Will I not run into the same memory issues? Or do I create only one doc object and reuse it? With so many Doc/Token pairs, won't searching the index t

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
I think the solution I gave you will work. The only problem is if a token appears twice in the same doc: doc1 has foo with two different sets of weights and frequencies... but I think you're saying that doesn't happen On 05/05/2011 06:09 PM, Chris Schilling wrote: Hey Mike, Let me clarify:

Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Oh, yes, they are unique within a document. I was also thinking about something like this. But I would be replacing a large number of fields within a document by a large number of documents. Let me see if I can work that out. On May 5, 2011, at 3:01 PM, Mike Sokolov wrote: > Are the tokens

Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Hey Mike, Let me clarify: The tokens are not unique. Let's say doc1 contains the token foo and has the properties weight1 = 0.75, weight2 = 0.90, frequency = 10 Now, let's say doc2 also contains the token foo with properties: weight1 = 0.8, weight2 = 0.75, frequency = 5 Now, I want to search

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
Are the tokens unique within a document? If so, why not store a document for every doc/token pair with fields: id (doc#/token#) doc-id (doc#) token weight1 weight2 frequency Then search for token, sort by weight1, weight2 or frequency. If the token matches are unique within a document you will

new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Hi, I am trying to figure out how to solve this problem: I have about 500,000 files that I would like to index, but the files are structured. So, each file has the following layout: doc1 token1, weight11, frequency1, weight21 token2, weight12, frequency2, weight22 . . . etc for 500,000 docs.

Querying Lucene property for exact value

2011-05-05 Thread ces3w
Hi, I am new to Lucene, so I apologize if this has been answered, but I've had no success finding the answer after googling around. I am using Compass as a Lucene front end and have run into an issue in querying Lucene docs. I am looking for a way to search a property based on it's complete and

Re: QueryValidator

2011-05-05 Thread Mike Sokolov
It's an idea - sorry I don't have an implementation I can share easily; it's embedded in our application code and not easy to refactor. I'm not sure where this would fit in the solr architecture; maybe some subclass of SearchHandler? I guess the query rewriter would need to be aware of which

Re: Using Solr's (Auto)suggest with plain lucene

2011-05-05 Thread Michael McCandless
Also, have a look at the patch on this issue: https://issues.apache.org/jira/browse/LUCENE-2995 That issue factors out spell checking / auto suggest from Lucene & Solr into a shared module. Mike http://blog.mikemccandless.com On Thu, May 5, 2011 at 8:54 AM, Clemens Wyss wrote: > I have im

Re: QueryValidator

2011-05-05 Thread Bernd Fehling
Hi Michael sounds excellent to me. Is it a QParserPlugin or what is it? Regards Bernd Am 05.05.2011 14:05, schrieb Michael Sokolov: In our applications, we catch ParseException and then take one of the following actions: 1) report an error to the user 2) rewrite the query, stripping all p

Re: Using Solr's (Auto)suggest with plain lucene

2011-05-05 Thread Dawid Weiss
If you check out the source code of solr/lucene, look at FSTLookup class and FSTLookupTest -- you can populate FSTLookup manually with terms/ phrases from your index and then use the resulting automaton for suggestions. Dawid On Thu, May 5, 2011 at 2:54 PM, Clemens Wyss wrote: > I have implemen

Using Solr's (Auto)suggest with plain lucene

2011-05-05 Thread Clemens Wyss
I have implemented my index (in fact it's a plugable indexing API) in "plain Lucene". It tried to implement a term suggestion mechanism on my own, being not to happy so far. At http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest I have seen Solr's auto suggestion for search terms.

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 11:59, Ian Lea wrote: See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for an excellent article and solution to the problem with common words. Would this work when the user doesnt actualy use a phrase query You could also consider using

Re: QueryValidator

2011-05-05 Thread Michael Sokolov
In our applications, we catch ParseException and then take one of the following actions: 1) report an error to the user 2) rewrite the query, stripping all punctuation, and try again 3) rewrite the query, quoting all punctuation, and try again would that work for you? On 5/5/2011 3:26 AM, Bern

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Ian Lea
See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for an excellent article and solution to the problem with common words. You could also consider using, and caching and reusing, filters for the tnum and tracks fields. -- Ian. On Thu, May 5, 2011 at 11

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 11:13, Ahmet Arslan wrote: Yes correct, but I have looked and the list of optimizations before. What was clear from profiling was that it wasnt the searching part that was slow (a query run on the same index with only a few matching docs ran super fast) the slowness only occurs when

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Ahmet Arslan
> Yes correct, but I have looked and the list of > optimizations before. What was clear from profiling was that > it wasnt the searching part that was slow (a query run on > the same index with only a few matching docs ran super fast) > the slowness only occurs when there are loads of matching > do

QueryValidator

2011-05-05 Thread Bernd Fehling
Dear list, I need a QueryValidator and don't mind writing one but don't want to reinvent the wheel in case there is already something. Is this the right list for talking about a QueryValidator or should it belong to SOLR? What do I mean with a QueryValidator? I think about something like valida

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 00:24, Ahmet Arslan wrote: Thanks again, now done that but still not having much effect on total ime, So your main concern is enhancing the running time? , not to decrease the number of returned results. Additionally http://wiki.apache.org/lucene-java/ImproveSearchingSpeed Yes c

Re: Anyway to not bother scoring less good matches ?

2011-05-05 Thread Paul Taylor
On 05/05/2011 00:24, Chris Hostetter wrote: : Well I did extend QuerParser, and the method is being called but rather : disappointingly it had no noticeablke effect on how long queries took. I : really thought by reducing the number of matches the corresponding scoring : phase would be quicker.