Re: Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Otis Gospodnetic
Nigel, Based on the description, I'd suspect unnecessarily(?) large JVM heap and insufficient RAM for caching the actual index. Run vmstat while querying the index and watch columns: bi, bo, si, so, wa, and id. :) If what I said above is correct, then you should see more data loaded from dis

Re: Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Ken Krugler
Hi Chris, Others on this list will be able to provide much better optimization suggestions, but based on my experience with some large indexes... 1. For search time to vary from < 1 second => 20 seconds, the only two things I've seen are: * Serious JVM garbage collection problems. * You're

Analyzing performance and memory consumption for boolean queries

2009-06-23 Thread Nigel
Our query performance is surprisingly inconsistent, and I'm trying to figure out why. I've realized that I need to better understand what's going on internally in Lucene when we're searching. I'd be grateful for any answers (including pointers to existing docs, if any). Our situation is this: We

Re: ScoreDocComparator to FieldComparator

2009-06-23 Thread Michael McCandless
Probably starting from RelevanceComparator (in FieldComparator.java, in Lucene's sources) is a good starting point? Basically you have to hold onto the scorer, and ask it for the score of each doc, and then record the scores (& docIDs, and anything else you need to do your comparison) privately.

Re: Similarity

2009-06-23 Thread Shashi Kant
http://code.google.com/p/semanticvectors/ If you search the archives of this mailing-list, there have been plenty of discussions in the past about LSI/LSA & Lucene. On Tue, Jun 23, 2009 at 6:55 AM, Cool The Breezer wrote: > > Shashi, >          I think I am planning or intended to do the same t

Re: Similarity

2009-06-23 Thread Cool The Breezer
Shashi, I think I am planning or intended to do the same thing as implemented in LSI methodology. It seems from your meesage, you or somebody might have used the LSI approach in lucene. So can you share some of your work. I am more interested to know any library or package or paper us

Re: SegmentReader retaining memory

2009-06-23 Thread Michael McCandless
I agree we should do something about this. Actually I think we should simply remove finalize() from Directory[Index]Reader. Can you open a Jira issue? Thanks. Mike On Mon, Jun 22, 2009 at 6:32 PM, Groose, Brian wrote: > In the application I'm working on, I'm opening a new index every 15-20 > m

ScoreDocComparator to FieldComparator

2009-06-23 Thread Raimon Bosch
Hi! We are doing a migration from ScoreDocComparator to FieldComparator in order to get better performance and try its new features. I was wondering how we can acces to ScoreDoc's of a document inside FieldComparator. Can we use FieldComparator as ScoreDocComparator? Thanks in advance, Raimon

Re: Lucene directory copy - master copy to local index

2009-06-23 Thread Ian Lea
Hi You could look at the Solr scripts, I guess (the ones that use rsync, not new java only method). But it isn't that hard. See the rsync docs for options and how to get it up and running (email me off list if you like - that has nothing to do with lucene) then you need to fire off an rsync cop

Re: Similarity

2009-06-23 Thread Shashi Kant
I suspect what you are looking for is "Latent Semantics" - it can algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply some of those approaches using the TermVectors in Lucene index. Ontologies such as Wo

Re: Lucene directory copy - master copy to local index

2009-06-23 Thread Amin Mohammed-Coleman
Hi Sorry for sending the below..what I meant to say was is there any documentation that I can be pointed to with using lucene and rsync? I been up since 2am so brain slowing down really quickly... Cheers Amin On Tue, Jun 23, 2009 at 10:13 AM, Amin Mohammed-Coleman wrote: > Hi > > Thanks for yo

Similarity

2009-06-23 Thread Cool The Breezer
Of the late I started using Lucene as main search library for all documents in our intranet. It works extremely well. I am trying to use similarity kinda functionality to find similarity between two sentences/documents and trying to use Wordnet in our searching solution. I have used wordnet con

Re: Lucene directory copy - master copy to local index

2009-06-23 Thread Amin Mohammed-Coleman
Hi Thanks for your replies. Is there any documentation that I can look at for using rsync? I am thinking of creating my own solution (early days) I might come back to Solr. Cheers Amin On Mon, Jun 22, 2009 at 10:24 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Solr (as of 1.4)