Re: Filters and multiple, per-segment calls to getDocIdSet

2010-03-25 Thread Daniel Noll
On Thu, Mar 25, 2010 at 21:41, Michael McCandless wrote: > > This depends on the particulars of filter... but in general you > shouldn't have to consume more RAM, I think?  Ie you should be able to > do your computation against the top-level reader, and then store the > results of your computation

custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-03-25 Thread britske
Hi, perhaps first some background: I need to speed-up indexing for an particular application which has a pretty unsual schema: besides the normal stored and indexed fields we have about 20.000 fields per document which are all indexed/ non-stored sInts. Obviously indexing was really slow wit

Strange issue with String vs. Query

2010-03-25 Thread Brian Pontarelli
I'm new to the list and I'm having an issue that I wanted to ask about quick. I'm using Lucene version 2.4.1 I recently rewrote a query to use the Query classes rather than a String and QueryParser. The search results between the two queries are now in different orders while the number of resul

Re: Garbage Collection performance on 2.9.2

2010-03-25 Thread Michael McCandless
Are you using IndexReader.reopen to open those new searchers? Can you capture a memory dump when GC kicks in? I'd be curious to see where all the "new" garbage is coming from... I don't think 2.3.2 -> 2.9.2 should be generating more garbage. Mike On Thu, Mar 25, 2010 at 3:55 PM, Siraj Haider w

Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

2010-03-25 Thread Erick Erickson
I would be extraordinarily surprised if this was in Lucene, this is so basic to how it works that the howls would be heard world-round . So I'm guessing it's in your code. Could you show it to us? Or, better yet, create a small, self-contained test case that illustrates your problem? Also, what a

Re: Garbage Collection performance on 2.9.2

2010-03-25 Thread Siraj Haider
Indexing happens in a different thread on intervals. I open a new IndexWriter for each indexing session. After indexing session if there is a modification in the index I close the searcher and open a new one. I have two searchers that I flip flop when opening an index. On 3/25/2010 3:26 PM,

Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed

2010-03-25 Thread Constantine Vetoshev
I have a strange problem with Field.Store.NO and Field.Index.ANALYZED fields with Lucene 3.0.1. I'm testing my app with twenty test documents. Each has about ten fields. All fields except one, "Content", are set as Field.Store.YES. The "Content" field is set as Field.Store.NO and Field.Index.ANALY

Slides from Finite-State Queries, Flexible Indexing, Scoring talk

2010-03-25 Thread Otis Gospodnetic
Hello everyone, Robert Muir gave a great presentation on a few advanced Lucene topics last night and even found time to send this presentation to me, which I just uploaded: http://www.slideshare.net/otisg/finite-state-queries-in-lucene You'll find all other presentations from the NYC Search

Re: Garbage Collection performance on 2.9.2

2010-03-25 Thread Michael McCandless
How do you reopen your searchers after indexing? Do you keep a single IW open for all time? Mike On Thu, Mar 25, 2010 at 3:11 PM, Siraj Haider wrote: > Indexing happen with frequent intervals on our indexes, but I think > searching is the cause of the issue, because as soon as the indexes are h

Re: Flex API - Debugging Segment Merge

2010-03-25 Thread Michael McCandless
On Thu, Mar 25, 2010 at 3:04 PM, Renaud Delbru wrote: > Hi Michael, > > On 25/03/10 18:45, Michael McCandless wrote: >> >> Hi Renaud, >> >> It's great that you're pushing flex forward so much :) You're making >> some cool sounding codecs!  I'm really looking forward to seeing >> indexing/searching

Re: Custom Filter

2010-03-25 Thread Siraj Haider
I figured this one out... it was due to a mistake in my code... sorry for trouble. -siraj On 3/25/2010 5:48 AM, Ian Lea wrote: Could this maybe have something to do with per-segment readers, as mentioned in recent message from Daniel? Posting lucene version and the full stack trace dump is al

Re: Garbage Collection performance on 2.9.2

2010-03-25 Thread Siraj Haider
Indexing happen with frequent intervals on our indexes, but I think searching is the cause of the issue, because as soon as the indexes are hit with a lot of searches, the gc cycles become more frequent. -siraj On 3/24/2010 5:19 PM, Michael McCandless wrote: Is this during indexing or searchi

adapting lucene's practical scoring function

2010-03-25 Thread Mathias Silbermann
Dear Lucene Users, I'd like to use Lucene to find scientific papers in the index that are similar to a given paper from the index. This seems to be possible using the MoreLikeThis-feature or wrapping the given document in a query composed of several other queries (BooleanQuery). The similarity

Re: Flex API - Debugging Segment Merge

2010-03-25 Thread Renaud Delbru
Hi Michael, On 25/03/10 18:45, Michael McCandless wrote: Hi Renaud, It's great that you're pushing flex forward so much :) You're making some cool sounding codecs! I'm really looking forward to seeing indexing/searching performance results on Wikipedia... I'll share them for sure whenever

Re: Flex API - Debugging Segment Merge

2010-03-25 Thread Michael McCandless
Hi Renaud, It's great that you're pushing flex forward so much :) You're making some cool sounding codecs! I'm really looking forward to seeing indexing/searching performance results on Wikipedia... It sounds most likely there's a bug in the PFor impl? (Since you don't hit this exception with th

Flex API - Debugging Segment Merge

2010-03-25 Thread Renaud Delbru
Hi, I am currently benchmarking various compression algorithms using the Sep Codec, but I got index corruption exception during the merge process, and I would need your help to debug it. I have reimplemented various algorithms like FOR, Simple9, VInt, PFor for the Sep IntBlock Codec. I am be

Re: Is anyone using SOLR in Australia?

2010-03-25 Thread Erick Erickson
No clue, but you might get more responses by asking on the SOLR users' list. You might be able to get something from the "Powered by SOLR" page at: http://wiki.apache.org/solr/PublicServers Best Erick On Thu, Mar 25, 2010 at 1:34 AM, Andrew Bruno wrote: > Hi all, > > I was wondering if anyo

Re: Range Queries Performance Hit

2010-03-25 Thread Ian Lea
That question should be asked on the clucene list. This is the java-user lucene list. -- Ian. On Thu, Mar 25, 2010 at 12:19 PM, wrote: > > Hi, > > Is there something of this sort  provided in clucene as well..lucene for > c++ ??? > > thanks, > Suman > >> No.  See java classes >> >> org.apach

Re: Range Queries Performance Hit

2010-03-25 Thread suman . holani
Hi, Is there something of this sort provided in clucene as well..lucene for c++ ??? thanks, Suman > No. See java classes > > org.apache.lucene.search.NumericRangeQuery > org.apache.lucene.document.NumericField > > See also recent thread on this list with subject "Lucene 3.0 Search > Performan

Re: Range Queries Performance Hit

2010-03-25 Thread Ian Lea
No. See java classes org.apache.lucene.search.NumericRangeQuery org.apache.lucene.document.NumericField See also recent thread on this list with subject "Lucene 3.0 Search Performance Stats". -- Ian. On Thu, Mar 25, 2010 at 11:53 AM, wrote: > > U mean I need to use padding technique in ind

RE: Range Queries Performance Hit

2010-03-25 Thread suman . holani
U mean I need to use padding technique in indexing and searching in order to make numeric searches rt? for numbers 1...10 indexes should be 01 0210 rather than 1 10 2.9 thanks, Suman > You should use NumericRangeQuery and NumericField (since 2.9). > > - > Uwe Schindler > H.-H.-Me

RE: Range Queries Performance Hit

2010-03-25 Thread Uwe Schindler
You should use NumericRangeQuery and NumericField (since 2.9). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: suman.hol...@zapak.co.in [mailto:suman.hol...@zapak.co.in] > Sent: Thursday, March 25, 2010

Range Queries Performance Hit

2010-03-25 Thread suman . holani
Hello, Range queries are lowering down the performance of search. I am using date in my clucene application . lucene doc has these kind of fields: startdt="1242758400" enddt="1241980500" now when i am searching for searchingdate = new RangeQuery(lastyear time in seconds,current time in secon

Re: Filters and multiple, per-segment calls to getDocIdSet

2010-03-25 Thread Michael McCandless
On Thu, Mar 25, 2010 at 12:55 AM, Daniel Noll wrote: > Hi all. > > I notice that Filter.getDocIdSet() is now documented as follows: > >    Note: This method will be called once per segment in >    the index during searching.  The returned {...@link DocIdSet} >    must refer to document IDs for tha

Re: Custom Filter

2010-03-25 Thread Ian Lea
Could this maybe have something to do with per-segment readers, as mentioned in recent message from Daniel? Posting lucene version and the full stack trace dump is always a good idea. -- Ian. On Wed, Mar 24, 2010 at 6:56 PM, Siraj Haider wrote: > Hello there, > I am getting exception when run