Re: heap memory issues when sorting by a string field

2009-12-09 Thread Ganesh
I think, This problem will happen for all sorted fields. I am sorting on integer field. I ran small test and found after closing all the Database, the WeekHashMap and int[] are not released. Please find the profiler screenshot attached. Is there any way to release this memory / How to fix it ex

Re: Converting HitCollector to Collector

2009-12-09 Thread Shai Erera
Hi Max, In 3.0.0 (actually in 2.9.0 already), Lucene moved to execute its searches one sub-reader at a time. As a consequence, absolute docIDs are not passed to the collect method anymore, but instead the relative docIDs of that reader. An example, suppose you have 2 segments, with 6 documents tot

RE: heap memory issues when sorting by a string field

2009-12-09 Thread Toke Eskildsen
Thanks for the heads-up, TCK. The Dietz & Sleator article I found at http://www.cs.cmu.edu/~sleator/papers/maintaining-order.pdf looks very interesting. String sorting in Lucene is indeed fairly expensive and we've experimented with two solutions to this, none of which are golden bullets. 1) Sto

Re: heap memory issues when sorting by a string field

2009-12-09 Thread Michael McCandless
It's not that it's "necessary" -- this is just how Lucene's sorting has always worked ;) But, it's just software! You could whip up a patch... I'm not familiar with the order-maintenance problem & solutions offhand, but it certainly sounds interesting. One issue is that loading only certain val

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Michael McCandless
Yes, you found it! Is that what you're hitting? I don't know of a workaround though... this is just how SpanQuery currently works... Mike On Wed, Dec 9, 2009 at 4:56 PM, Jason Rutherglen wrote: > Mike, > > Is this the thread? > > http://www.lucidimagination.com/search/document/1e87d488a904b89f

RE: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-09 Thread Uwe Schindler
This is a bug in InstantiatedIndex. The termDoc(null) was added to get all documents. This was never implemented in Instantiated Index. Can you open an issue? There maybe other queries fail because of this (e.g. FieldCacheRangeFilter,...). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen

Re: heap memory issues when sorting by a string field

2009-12-09 Thread TCK
Thanks Mike for opening this jira ticket and for your patch. Explicitly removing the entry from the WHM definitely does reduce the number of GC cycles taken to free the huge StringIndex objects that get created when doing a sort by a string field. But I'm still trying to figure out why it is neces

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Jason Rutherglen
Mike, Is this the thread? http://www.lucidimagination.com/search/document/1e87d488a904b89f/spannearquery_s_spans_payloads#8103efdc9705a763 Maybe we need a recommended workaround for this? Jason On Wed, Dec 9, 2009 at 1:17 PM, Michael McCandless wrote: > That sounds familiar... try to track do

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Michael McCandless
That sounds familiar... try to track down the last thread maybe? I think it was this: if the payload was already retrieved for a prior span then the current span won't be able to retrieve it, so even though you know a payload falls within the span you're looking at, you won't get it back, if it al

Converting HitCollector to Collector

2009-12-09 Thread Max Lynch
Hi, I have a HitCollector that processes all hits from a query. I want all hits, not the top N hits. I am converting my HitCollector to a Collector for Lucene 3.0.0, and I'm a little confused by the new interface. I assume that I can implement by new Collector much like the code on the API Docs:

MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-09 Thread Jason Fennell
I'm trying to upgrade our application from Lucene 2.4.1 to Lucene 2.9.1. I've been using an InstantiatedIndex to do a bunch of unit testing, but am running into a some problems with Lucene 2.9.1. In particular, when I try to run a MatchAllDocsQuery on my InstantiatedIndex (which worked fine on 2.4.

Re: index reader for multiple indexes

2009-12-09 Thread David Causse
On Fri, Oct 02, 2009 at 11:40:09PM -0700, m.harig wrote: > > Thanks Uwe Schindler , > > If i use an IndexReader[] to use MultiReader , will it be thread > safe? because i've to reopen my IndexReader to check whether my index is > updated or not . In this case how do i handle it? please sug

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Jason Rutherglen
Right we're getting the spans, however it's just the payloads that are missing, randomly... On Wed, Dec 9, 2009 at 2:23 AM, Michael McCandless wrote: > There was a thread a while back about how span queries don't enumerate > every possible span, but I can't remember if that included sometimes > m

RE: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Rob Staveley (Tom)
> Don't you have a playground to properly test your changes Yes, I'll be doing a practice run in a DEV cluster. It is the practice run that I'm planning at this point. Many thanks for your pointers, Danil. -Original Message- From: Danil ŢORIN [mailto:torin...@gmail.com] Sent: 09 Dece

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Danil ŢORIN
There are a LOT of deprecated stuff in 2.9.1 (but it's still there) and your code should run as it is (however there are some changes in behavior, so read carefully CHANGES.txt) In 3.0 this old stuff is removed. Your production readers may not even start (which I guess is more painful than 2 step

RE: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Rob Staveley (Tom)
COMPRESS is supported (only deprecated) in 2.9.1, so I'm expecting them to be supported http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/document/Field.Store.html#COMPRESS I guess I should expect optimize() to increase the size of the index as compressed fields are expanded as it s

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Danil ŢORIN
2nd point can be simply archived by an optimize (which will read old segments and will create a new one) But I'm not sure how it handles compressed fields. On Wed, Dec 9, 2009 at 16:50, Rob Staveley (Tom) wrote: > Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting > r

RE: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Rob Staveley (Tom)
Thanks, Danil. I think you've saved me a lot of time. Weiwei too - converting rather than reindexing everything, which will save a lot of time. So, I should do this: 1. Convert readers to 2.9.1, which should be able to read any 2.x index including the existing 2.3.1 indexes 2. Convert writers t

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Danil ŢORIN
You NEED to update your readers first, or else they will be unable to read files created by newer version. And trust me, there are changes in index format from 2.3 -> 2.9 On Wed, Dec 9, 2009 at 15:11, Weiwei Wang wrote: > Hi, Rob, > I read > http://wiki.apache.org/lucene-java/BackwardsCompatibili

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Weiwei Wang
Hi, Rob, I read http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats and found no compatibility guarantee for IndexWriter between different version. You can run your idea as a test and see the output. If it doesn't work, i suggest you convert your index to new version as I said i

RE: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Rob Staveley (Tom)
Thanks for the swift response, Weiwei. In my deployment, my index readers are in a data centre and therefore more difficult to upgrade than the writers. That's why I wanted to start with the writers rather than the readers. I realise that it looks the wrong way round and http://wiki.apache.org/

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Weiwei Wang
I’ve finished a upgrade from 2.4.1 to 3.0.0 What I do is like this: 1. Upgrade my user-defined analyzer, tokenizer and filter to 3.0.0 2. Use a 3.0.0 IndexReader to read the old version index and then use a 3.0.0 IndexWriter to write all the documents into a new index 3. Update QueryPaser to 3.0.0

Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Rob Staveley (Tom)
I have Lucene 2.3.1 code and indexes deployed in production in a distributed system and would like to bring everything up to date with 3.0.0 via 2.9.1. Here's my migration plan: 1. Add a index writer which generates a 2.9.1 "test" index 2. Have that "test" index writer push that 2.9.1 "test" ind

Re: NearSpansUnordered payloads not returning all the time

2009-12-09 Thread Michael McCandless
There was a thread a while back about how span queries don't enumerate every possible span, but I can't remember if that included sometimes missing payloads... Mike On Tue, Dec 8, 2009 at 7:34 PM, Jason Rutherglen wrote: > Howdy, > > I am wondering if anyone has seen > NearSpansUnordered.getPayl

Re: FileNotFoundException on index

2009-12-09 Thread Michael McCandless
OK thanks for bringing closure! Accidentally allowing 2 writers to write to the same index quickly leads to corruption. They are like the Betta fish: they fight to the death, removing each others files, if you put them in the same cage. Mike On Wed, Dec 9, 2009 at 1:56 AM, Max Lynch wrote: > H

RE: [VOTE] Push fast-vector-highlighter mvn artifacts for 3.0.0 and 2.9.1

2009-12-09 Thread Uwe Schindler
Hi all, The missing maven artifacts for the fast-vector-highlighter contrib of Lucene Java in version 2.9.1 and 3.0.0 are now available at: http://repo1.maven.org/maven2/org/apache/lucene/ http://repo2.maven.org/maven2/org/apache/lucene/ Uwe - Uwe Schindler uschind...@apache.org Apache Luc