ParallelReader and updateDocument don't play nice?

2011-02-22 Thread Groose, Brian
I have been looking at using ParallelReader as its documentation indicates, to allow certain fields to be updated while most of the fields will not be updated. However, this does not seem possible. Let's say I have two indexes, A and B, which are used in a ParallelReader. If I update a documen

Are you going to enter Google Summer of Code?

2011-02-22 Thread Lasantha Bandara
Hi, When I searched for java projects that can be acceptable for GSOC 2011, I found Lucene. It nicely matches with my interests. Could you please tell me whether you are going to be there this time. If yes, what kind of ideas that you will be present. I like to start quite earlier working on this

Re: IndexWriter.close() performance issue

2011-02-22 Thread Mark Kristensson
I'm resurrecting this old thread because this issue is now reaching a critical point for us and I'm going to have to modify the Lucene source code for it to continue to work for us. Just a quick refresher: we have one index with several hundred thousand unqiue field names and found that opening an

[Announce] RankingAlgorithm ver 1.1

2011-02-22 Thread Nagendra Nagarajayya
Hi! I would like to announce the release of RankingAlgorithm ver 1.1 and would like to invite you to try it out. It is very good and does not need any changes to your existing indexes but the way they are accessed, ranked and scored changes. This version has Score Boosting enabling Document

Re: Serialization of Lucene Document objects

2011-02-22 Thread Erik Fäßler
Hi Simon, thanks for your answer. My comments below: so you mean you would want to do that analysis on the client side and only shoot the already tokenized values to the server? What exactly is too slow? Can you provide more info what the problem is? After all I think you should ask on the sol

Re: Serialization of Lucene Document objects

2011-02-22 Thread Simon Willnauer
On Tue, Feb 22, 2011 at 2:58 PM, Erik Fäßler wrote: >  Hi there, > > I'd like to serialize some Lucene Documents I've built before. My goal is to > send the documents over a http connection to a Solr server which then should > add them to its index. ok so why do you build lucene documents if you

Serialization of Lucene Document objects

2011-02-22 Thread Erik Fäßler
Hi there, I'd like to serialize some Lucene Documents I've built before. My goal is to send the documents over a http connection to a Solr server which then should add them to its index. I thought this would work as the Document class implements Serializable as do the Fields. Unfortunately,

Re: Suggest search terms

2011-02-22 Thread Fernando Wasylyszyn
Well, actually it depends If your suggestion terms corresponds with the terms in your "main" index, then you can use TermEnum#docFreq()+ Otherwise, if you develop a separate index for the suggestions (that do not correspond with the terms in your main index), then you just can add a calculat

Re: recurrent IO/CPU peaks

2011-02-22 Thread Michael McCandless
On Tue, Feb 22, 2011 at 3:15 AM, wrote: > Here is how long it took for each run : >  - default : run 1 = 55 minutes, run 2 = 59 minutes >  - balanced : run 1 = 145 minutes, run 2 = 121 minutes > > Is that an expected behavior? Hmm BalancedSegmentMergePolicy was over 2X slower to optimize...? Th

Re: Suggest search terms

2011-02-22 Thread Simon Willnauer
On Tue, Feb 22, 2011 at 11:23 AM, Clemens Wyss wrote: > Fernando, Uwe thanks for your suggestions. > Is it possible to get the number of "hits" per term? > ferrari (125) > lamborghini (34) > ... I think you can just call TermEnum#docFreq(), no? simon > >> -Ursprüngliche Nachricht- >> Von

AW: Suggest search terms

2011-02-22 Thread Clemens Wyss
Fernando, Uwe thanks for your suggestions. Is it possible to get the number of "hits" per term? ferrari (125) lamborghini (34) ... > -Ursprüngliche Nachricht- > Von: Fernando Wasylyszyn [mailto:ferw...@yahoo.com.ar] > Gesendet: Montag, 21. Februar 2011 21:11 > An: java-user@lucene.apache.

Re: lucene3.0.3 | get correct document in case of multiple Boolean query in search criteria

2011-02-22 Thread Ranjit Kumar
Hi, As, mention above i am using query like: criteria = (sql OR sqlserver OR "sql server") AND java AND delphi In the above scenario i need hit(document) containing at least one occurrence of (sql OR sqlserver OR "sql server"). Also java and delphi must present in document. Still I have not g

Re: recurrent IO/CPU peaks

2011-02-22 Thread v . sevel
Hi, I did some tests with the BalancedSegmentMergePolicy, looking specifically at the optimize. I have an index that is 70 Gb large, and contains around 35 millions documents. I duplicated the index 4 times, and I ran 2 optimize with the default merge policy, and 2 with the balanced policy. He

Re: Lucene TermVector

2011-02-22 Thread Simon Willnauer
Hey, On Mon, Feb 21, 2011 at 8:56 PM, Ajay Anandan wrote: > Hi > I am trying to implement an Expectation Maximization algorithm for document > clustering.  I am planning to use Lucene Term Vectors for finding similarity > between 2 documents.  There are 2 kinds of EM algos using naive Bayes: the