Multi Field AND Search

2009-06-18 Thread saurabhs_iitk
Hi I have indexed 8 fileds with different boost. Now i have given a searchstring which consists of a words and phrases. Now i want to do AND search of that searchString on four fields and show the result based on boost. For me searchString should occur completely in one of the field and then the b

Re: update a specific document

2009-06-18 Thread Anshum
HI Galaio, To update a document in lucene this way, you'd have to first delete the document using indexReader's delete document and then readd the document (thereby changing the internal docId as well). You may use: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexReader.html#de

Synchronizing Lucene indexes across 2 application servers

2009-06-18 Thread mitu2009
I've a web application which uses Lucene for search functionality. Lucene search requests are served by web services sitting on 2 application servers (IIS 7).The 2 application servers are Load balanced using "netscaler". Both these servers have a batch job running which updates search indexes on

Re: n-gram word support

2009-06-18 Thread Sameer Maggon
Yeah, look at the spellcheck component in Solr. They are doing something similar. Sameer. On Thu, Jun 18, 2009 at 7:15 PM, Neha Gupta wrote: > Hey, > > I was wondering if there is a way to read the index and generate n-grams of > words for a document in lucene? I am quite new to it and am using

n-gram word support

2009-06-18 Thread Neha Gupta
Hey, I was wondering if there is a way to read the index and generate n-grams of words for a document in lucene? I am quite new to it and am using pylucene. Thanks, Neha

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Teruhiko Kurosaka
> From: Jay Booth [mailto:jbo...@wgen.net] > Are you fetching all of the results for your search? No, I'm not doing anything on the search results. This is essentially what I do: searcher = new IndexSearcher(IndexReader.open(indexFileDir)); query = new TermQuery(new Term(fieldNam

Re: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Yonik Seeley
On Thu, Jun 18, 2009 at 3:54 PM, Teruhiko Kurosaka wrote: > Because the number of hits was proportinoal to the number > of Documents in the index in my previous test, I came > to a wrong conclusion that the search time is proportional > to the index size.  If I have only one Document that can > mat

update a specific document

2009-06-18 Thread João Silva
Hi, I want to update a specific document, but i didn't found updateDocument(Query) or updateDocument(Term[]), so to make a update, i will need to have a term with an unique id, so a retrieve a u)nique document, There's any way of access the internal document id? For example, imagine that i have the

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Jay Booth
Are you fetching all of the results for your search? If so, you're actually measuring the time to pull n stored documents out of the index, not to search over an index of n documents. Which would of course be linear, most of your cost there will be the i/o to actually pull the document from disk,

RE: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Teruhiko Kurosaka
Erik, The way I test this program is by issuing 1000 queries and I have profiled it to make sure the start up cost is negligible. I ran a further test and discovered that the search time is actually proportional to the number of potential hits. (I am saying "potential hits" because I am limiting

Re: Query rewriting/optimization

2009-06-18 Thread Chris Hostetter
: > ((Src:Testing Dst:Test) (Src:Test2 Port:http)). : > In this case, would Lucene optimize to remove the unwanted BooleanQueries ? : Alas, Lucene in general does not do such structural optimization (and : I agree, we should). EG we could do it during Query.rewrite(). Except that flattening Boo

Re: gui based app/user interface apps..

2009-06-18 Thread Otis Gospodnetic
Bruce, I think you should ask on nutch-user instead on this java-user Lucene list. You will see a link to a Windows app for managing Nutch on the Nutch Wiki. Nutch used to have a UI, but I'm afraid Nutch 1.0 no longer has it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

Re: Same score for different fields

2009-06-18 Thread Otis Gospodnetic
Nada, Scores and norms are two different things. If you look for the Lucene class called DefaultSimilarity you will see how norms are computed: public float computeNorm(String field, FieldInvertState state) { final int numTerms; if (discountOverlaps) numTerms = state.getLength(

Same score for different fields

2009-06-18 Thread Nada Mimouni
Hi, I have created a Lucene index with two fields. Let's take this example entry from my index as displayed by Luke: Field | Norm |Value | 0.375| average | 0.375| sal

gui based app/user interface apps..

2009-06-18 Thread bruce
hi list... sorry to post here, but i figured you might be able to help... i'm working on a project, that deals with building a crawler, and i'm working out the details for the mgmt app for the crawler. i'm currently looking at how to deal with the status/actions of the crawler, and how the differ

Re: how to deal with too many clause error in boolean query.

2009-06-18 Thread Michael McCandless
On Wed, Jun 17, 2009 at 3:32 PM, Tim Williams wrote: > On Wed, Jun 17, 2009 at 3:16 PM, vanshi wrote: >> >> Hello all, >> >> I have a situation where a field is indexed like this >> (FAC_NAME(Field.Store.NO, Field.Index.NO_NORMS)) and keyword analyzer is >> used on this field. Although, I'm aware t

Re: Lucene performance: is search time linear to the index size?

2009-06-18 Thread Erick Erickson
Opening a searcher and doing the first query incurs a significant amount of overhead, cache loading, etc. Inferring search times relative to index size with a program like you describe is unreliable. Try firing a few queries at the index without measuring, *then* measure the time it takes for subs