Re: best practice for reusing documents with multi-valued fields

2011-04-18 Thread Anshum
Hi Chris, So doc.removeFields works fine, just tried it again. You could try using an approach on the lines of the one mentioned below. *--snip--* IndexWriter iw = new IndexWriter(indexDir, new StandardAnalyzer(Version.LUCENE_30),true, MaxFieldLength.UNLIMITED); Document doc = new Document(); doc

How to make search distributed and scalable

2011-04-18 Thread Weiwei Wang
Hi, buddies, I'm reading something about solr and elastic-search, the thing i have been curious is how to make search engine distributed(using something like hadoop?). I readed something about shards and replication tech mentioned in the user guide, but what is lacked for the open sour

Re: Calculate document lucene score after the search

2011-04-18 Thread Anshum
Hi Madhu, You could use IndexSearcher.explain(..) to explain the result and get the detailed breakup of the score. That should probably help you with understanding the boost and score as calculated by lucene for your app. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Apr 19, 2011 at 2:32 A

Calculate document lucene score after the search

2011-04-18 Thread madhuri_1820
Hi, I am trying to find the actual lucene score of the document after the search. I have set different boost values to fields. I am using document.getBoost(), to find the score but I am getting document boost as 1 for all the documents. Is there any way I can calculate the actual score of the d

Re: switching between Query parsers

2011-04-18 Thread Trejkaz
On Thu, Apr 14, 2011 at 9:44 PM, shrinath.m wrote: > Consider this case : > > Lucene index contains documents with these fields : > title > author > publisher > > I have coded my app to use MultiFieldQueryParser so that it queries all > fields. > Now if user types something like "author:tom" in s

StandardTokenizer question

2011-04-18 Thread Mindaugas Žakšauskas
Hi, Given the code is running under Lucene 3.0.1 8<-- import java.io.IOException; import java.io.Reader; import java.io.StringReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.StopAnal

Re: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-18 Thread Erick Erickson
You can easily string together your own tokenizer and any number of filters to create an analyzer that does exactly what you need. Lucene In Action shows an example for creating your own analyzer by assembling the standard parts Best Erick On Mon, Apr 18, 2011 at 3:08 AM, Clemens Wyss wrote:

Re: Choosing boosting in Lucene

2011-04-18 Thread Anshum
Hi Cristina, Lucene scores each doc per search based on its scoring formula. As there is a lot of query related normalizing and other component, the scores for docs change as the query changes. About understanding how boosting affects the score in detail, you may read about *lucene scoring* at http

Re: What doc id to use on IndexReader with SetNextReader

2011-04-18 Thread Antony Bowesman
Thanks Uwe, I assumed as much. On 18/04/2011 7:28 PM, Uwe Schindler wrote: Document d = reader.document(doc) This is the correct way to do it. Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additi

RE: What doc id to use on IndexReader with SetNextReader

2011-04-18 Thread Uwe Schindler
Hi, > Document d = searcher.getIndexReader.document(doc + docBase) This is of course working, but somehow useless as it transforms the ID two times and that slows down a little bit. Inside collectors you should *only* use the IndexReader and Scorer given by the setNextReader / setScorer calls. >

What doc id to use on IndexReader with SetNextReader

2011-04-18 Thread Antony Bowesman
Migrating some code from 2.3.2 to 2.9.4 and I have custom Collectors. Now there are multiple calls to collect and each call needs to adjust the passed doc id by docBase as given in SetNextReader. However, if you want to fetch the document in the collector, what docId/IndexReader combination s

AW: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-18 Thread Clemens Wyss
What is the best way to "avoid" the lowercasing (and still being able to exclude stop words)? > -Ursprüngliche Nachricht- > Von: Simon Willnauer [mailto:simon.willna...@googlemail.com] > Gesendet: Freitag, 15. April 2011 08:56 > An: java-user@lucene.apache.org > Betreff: Re: German*Filter