Re: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index

2010-01-11 Thread Michael McCandless
become smaller. > - The optimized index has practically the same size as the not optimized one. > > Yuliya > >> -Ursprüngliche Nachricht- >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] >> Gesendet: Freitag, 8. Januar 2010 14:38 >> An: java-user@l

Re: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index

2010-01-08 Thread Michael McCandless
sparsely"? > > Thanks, > Yuliya > >> -Ursprüngliche Nachricht- >> Von: Michael McCandless [mailto:luc...@mikemccandless.com] >> Gesendet: Donnerstag, 7. Januar 2010 18:00 >> An: java-user@lucene.apache.org >> Betreff: Re: Lucene 2.9 and 3.0: Optimi

Re: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index

2010-01-07 Thread Michael McCandless
gt; -Ursprüngliche Nachricht- >> Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Gesendet: Donnerstag, 7. Januar 2010 17:35 >> An: java-user@lucene.apache.org >> Betreff: Re: Lucene 2.9 and 3.0: Optimized index is thrice as >> large as the not optimize

Re: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index

2010-01-07 Thread Simon Willnauer
Do you have a reader open on the index which was opened before your your index was optimized? Maybe there is a reader around holding on the references to the merged segments. simon On Thu, Jan 7, 2010 at 5:23 PM, Yuliya Palchaninava wrote: > Hi, > > According to the api documentation: "In genera

Re: Lucene 2.9 and 3.0: Optimized index is thrice as large as the not optimized index

2010-01-07 Thread Otis Gospodnetic
Yuliya, The index *directory* will be larger *while* you are optimizing. After the optimization is completed successfully, the index directory will be smaller. It is possible that your index directory is large(r) because you have some left-over segments (e.g. from some earlier failed/interrup

Re: Lucene 2.9: IOException from IndexReader.reopen() - Real time search

2010-01-01 Thread Michael McCandless
On Thu, Dec 31, 2009 at 12:34 PM, Kumaravel Kandasami wrote: > Identified the problem. > > reader.close() was not getting called in a specific logic flow. Phew :) Thanks for bringing closure. Mike - To unsubscribe, e-mail: jav

Re: Lucene 2.9: IOException from IndexReader.reopen() - Real time search

2009-12-31 Thread Kumaravel Kandasami
Identified the problem. reader.close() was not getting called in a specific logic flow. Thank You. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Thu, Dec 31, 2009 at 11:11 AM, Kumaravel Kandasami < kumaravel.kandas...@gmail.co

Re: Lucene 2.9: IOException from IndexReader.reopen() - Real time search

2009-12-31 Thread Kumaravel Kandasami
Thanks Mike. I think it is something to do with the merge factor. Modified the code to do optimize in the finally block the following error message was thrown. Code Snippet: nameWriter.optimize(); // errors here nameWriter.close(); valueWriter.optimize(); //I am using mult

Re: Lucene 2.9: IOException from IndexReader.reopen() - Real time search

2009-12-31 Thread Michael McCandless
It sounds like you may be running out of file descriptors -- how many segments are in your index? The reopen logic looks correct (you are closing the old reader). Is there anything else that may be holding files open? Have you changed any of IW's settings, eg mergeFactor? Mike On Wed, Dec 30,

RE: lucene 2.9+ numeric indexing

2009-11-08 Thread Uwe Schindler
That's indeed strange. The problem has nothing to do with NumericField/NumericUtils and corresponding FieldCache parsing at all, it is more the autodetection falling back to NumericField parser, if the first term is not parseable as old-style numeric. Because of that you get this error message, bec

RE: Lucene 2.9 Spatial Search Problem

2009-10-02 Thread Uwe Schindler
Hallo Rajiv2, The LocalLucene from Sourceforge is not index-compatible to the recently added spatial contrib in Lucene. You have to reindex your spatial values (because the index format now makes use of the new Lucene 2.9 NumericField, which is now the standard for numeric fields). Uwe - Uwe

Re: Lucene 2.9 Spatial Search Problem

2009-10-02 Thread Michael McCandless
The required format for contrib/spatial has changed to NumericField, as of 2.9. Are you building your index with NumericField? Mike On Fri, Oct 2, 2009 at 2:04 PM, Rajiv2 wrote: > > Hello, I was using Lucene 2.4 and locallucene in my app and upgraded to > lucene 2.9 and I'm using the new spatia

Re: Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Mark Miller
Per segment over many segments is actually a bit faster for none sort cases and many sort cases -but an optimized index will still be fastest - the speed benifit of many segments comes when reopening - so say for realtime search - in that case you may want to sac the opt perf for a segment

RE: Lucene 2.9 RC4 now available for testing

2009-09-13 Thread Uwe Schindler
> Mark Miller wrote: > > Hello Lucene users, > > > > ... > > > > We let out a bug in the lock factory changes we made in RC3 - > > making a new SimpleFSDirectory with a String param would throw > > an illegal state exception - a fix for this is in RC4. > > My apologies - not SimpleFSDirectory, but

Re: Lucene 2.9 RC4 now available for testing

2009-09-13 Thread Mark Miller
Mark Miller wrote: > Hello Lucene users, > > ... > > We let out a bug in the lock factory changes we made in RC3 - > making a new SimpleFSDirectory with a String param would throw > an illegal state exception - a fix for this is in RC4. My apologies - not SimpleFSDirectory, but SimpleFSLockFactory

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
> http://svn.apache.org/viewvc?view=rev&revision=630698 This may be it. The scorer is sparse and usually in a conjuction with a dense scorer. Does the index format matter? I haven't yet built it with 2.9. Peter On Wed, Sep 9, 2009 at 10:17 AM, Yonik Seeley wrote: > On Wed, Sep 9, 2009 at 9:40 AM

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
>Is it possible that skipTo is very costly with your custom scorer? It's no more expensive than 'next'. The scorer's 'skipTo' and 'next' methods call termdocs.skipTo or termdocs.next to get the next 'candidate' doc. This just checks a BitVector to find the next non-deleted doc. But the scorer mus

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 9:40 AM, Peter Keegan wrote: > IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods > 64% fewer times. I see no 'advance' method in any of the hot spots'. I am > getting the same number of hits from the custom scorer. > Has the BooleanScorer2 logic chan

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Michael McCandless
Right, BooleanQuery will now try to use BooleanScorer (does "out of order" collection, which does not use skipTo/advance at all, I think) when possible, instead of BooleanScorer2. This only applies for boolean queries that have only SHOULD clauses, and up to 32 MUST_NOT clauses (if there's even 1

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Mark Miller
How about the new score inorder/out of order stuff? It was an option before, but I think now it uses whats best by default? And pairs with the collector? I didn't follow any of that closely though. - Mark Peter Keegan wrote: > IndexSearcher.search is calling my custom scorer's 'next' and 'doc' me

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
IndexSearcher.search is calling my custom scorer's 'next' and 'doc' methods 64% fewer times. I see no 'advance' method in any of the hot spots'. I am getting the same number of hits from the custom scorer. Has the BooleanScorer2 logic changed? Peter On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley <

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 9:17 AM, Yonik Seeley wrote: > On Wed, Sep 9, 2009 at 8:57 AM, Peter Keegan wrote: >> Using JProfiler, I observe that the improvement >> is due to a huge reduction in the number of calls to TermDocs.next and >> TermDocs.skipTo (about 65% fewer calls). > > Indexes are searched

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Yonik Seeley
On Wed, Sep 9, 2009 at 8:57 AM, Peter Keegan wrote: > Using JProfiler, I observe that the improvement > is due to a huge reduction in the number of calls to TermDocs.next and > TermDocs.skipTo (about 65% fewer calls). Indexes are searched per-segment now (i.e. MultiTermDocs isn't normally used). O

Re: Lucene 2.9 RC2 now available for testing

2009-09-09 Thread Peter Keegan
I've been testing 2.9 RC2 lately and comparing query performance to 2.3.2. I'm seeing a huge increase in throughput (2x-10x) on an index that was built with 2.3.2. The queries have a lot of BoostingTermQuerys and boolean clauses containing a custom scorer. Using JProfiler, I observe that the improv

Re: Lucene 2.9 RC2 now available for testing

2009-09-07 Thread Marcelo Ochoa
Hi All: I am already have integrated Lucene 2.9RC2 with Lucene Domain Index: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg As usual, a new Lucene version do a fastest product :) All my internal test runs OK and I only need to re-test on 10g database. Once Lucene 2.9 is ready for produ

Re: Lucene 2.9 RC2 now available for testing

2009-08-28 Thread Mark Miller
Mark Miller wrote: > > Download release candidate 1 here: > http://people.apache.org/~markrmiller/staging-area/lucene2.9rc2/ > In case anyone catches - yes that is a cut and paste typo - should read release candidate 2 (obvious, but just to cross my t's). -- - Mark http://www.lucidimagination.co

Re: Lucene 2.9 RC1 now available for testing

2009-08-28 Thread Mark Miller
The dist build issues have been addressed and RC2 will include the missing analyzer and db contrib binaries. Unfortunately, people.apache.org is not up at the moment (https://blogs.apache.org/infra/entry/apache_org_downtime_initial_report), but I will put up Lucene 2.9 RC2 when it comes back up.

Re: Lucene 2.9 RC1 now available for testing

2009-08-28 Thread Mark Miller
Apologies - you are correct - contrib/analyzers is in src but not the jar distrib. I will address whatever is up with the build process and put up another RC when apache servers are back up. Thanks for pointing this out, - Mark Bogdan Ghidireac wrote: > Thank you, Lucene 2.9 is a great release..

Re: Lucene 2.9 RC1 now available for testing

2009-08-28 Thread Bogdan Ghidireac
Thank you, Lucene 2.9 is a great release... I have one issue so far - I cannot find the contrib/analyzers jars, only the sources are present. Bogdan On Fri, Aug 28, 2009 at 1:17 AM, Mark Miller wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hello Lucene users, > > On behalf of the

Re: Lucene 2.9

2009-06-30 Thread Mark Miller
I hope July. Could easily be August though. I'm kicking and screaming to get it out soon though. Its been hurting my high brow reputation. On Tue, Jun 30, 2009 at 2:41 PM, Siraj Haider wrote: > is there an ETA for Lucene 2.9 release? > > -siraj > > ---

Re: Lucene 2.9 Release

2009-06-12 Thread Michael McCandless
Also, conversely, if you know of important issues that should be fixed for 2.9, please go and check that the "Fix Version" in Jira is in fact set to 2.9... Mike On Thu, Jun 11, 2009 at 8:41 AM, Mark Miller wrote: > Okay, its only been a short time and we have already whittled the list down > from

Re: Lucene 2.9 Release

2009-06-11 Thread Mark Miller
Okay, its only been a short time and we have already whittled the list down from 56 to 42. I think we have covered most of the easy calls. If you know an issue your involved in won't likely be done soon, please help us out and take off the version or push it to 3.1. Next time I go through, I'm

Re: Lucene 2.9

2009-05-21 Thread Michael McCandless
Darned that Google; they need to do better ;) Here's the entry from CHANGES.txt on Lucene's trunk: 2. LUCENE-1382: Add an optional arbitrary String "commitUserData" to IndexWriter.commit(), which is stored in the segments file and is then retrievable via IndexReader.getCommitUserData ins

Re: Lucene 2.9

2009-05-21 Thread Tim Williams
On Thu, May 21, 2009 at 1:12 PM, Michael McCandless wrote: > Sorry for the slow response. > > It's really not clear when 2.9 will be released.  We have accumulated > a number of good improvements -- higher performance field sorting, new > higher performance Collector (replaces HitCollector) API, >

Re: Lucene 2.9

2009-05-21 Thread Michael McCandless
Sorry for the slow response. It's really not clear when 2.9 will be released. We have accumulated a number of good improvements -- higher performance field sorting, new higher performance Collector (replaces HitCollector) API, segment-based searching, attaching a String label to each commit from

Re: Lucene 2.9

2009-03-17 Thread Luis Alves
Mark Miller wrote: Hmmm - you can probably get qsol to do it: http://myhardshadow.com/qsol. I think you can setup any token to expand to anything with a regex matcher and use group capturing in the replacement (I don't fully remember though, been a while since I've used it). So you could do

Re: Lucene 2.9

2009-03-11 Thread Michael McCandless
Yonik Seeley wrote: On Mon, Mar 9, 2009 at 2:02 PM, Michael McCandless wrote: Once added, something inside the index (a "write once" schema) records that this field is an IntField and then it's an error to ever use a different type field by that same name. I dunno... coupling functionalit

Re: Lucene 2.9

2009-03-11 Thread Mark Miller
Hmmm - you can probably get qsol to do it: http://myhardshadow.com/qsol. I think you can setup any token to expand to anything with a regex matcher and use group capturing in the replacement (I don't fully remember though, been a while since I've used it). So you could do a regex of something

Re: Lucene 2.9

2009-03-11 Thread Michael McCandless
Allahbaksh Mohammedali Asadullah wrote: For example I want to search amount >= 15 rather than doing it amount:[ 15] or something? Is there any open source queryparser which converts something like amount >=15 into lucene number format query. I don't know of any effort to change Lucene's

Re: Lucene 2.9

2009-03-09 Thread Yonik Seeley
On Mon, Mar 9, 2009 at 2:02 PM, Michael McCandless wrote: > Once added, something inside the index (a "write once" schema) records > that this field is an IntField and then it's an error to ever use a > different type field by that same name. I dunno... coupling functionality to restrictions seem

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
markharw00d wrote: >>(a "write once" schema) I like this idea. Enforcing consistent field-typing on instances of fields with the same name does not seem like an unreasonable restriction - especially given the upsides to this. And also when it's "opt-in", ie, you can continue to use untyp

Re: Lucene 2.9

2009-03-09 Thread markharw00d
>>(a "write once" schema) I like this idea. Enforcing consistent field-typing on instances of fields with the same name does not seem like an unreasonable restriction - especially given the upsides to this. It doesn't dispense with all the full schema logic in Solr but seems like a useful ba

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
mark harwood wrote: Time for some standardised index metadata? OK, thinking out loud... What if we created IntField, subclassing Field. It holds a single int, and you can add it to Document just like any other field. Once added, something inside the index (a "write once" schema) records th

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
mark harwood wrote: This trie/parser issue is an example of a broader issue for me. Yeah I agree. There was also a new Document impl attached in Jira somewhere to more strongly type fields (can't find it now), ie IntField, DateField, etc. And it also ties into refactoring AbstractField/Field

Re: Lucene 2.9

2009-03-09 Thread Yonik Seeley
On Mon, Mar 9, 2009 at 8:10 AM, Michael McCandless wrote: > Could we add APIs to QueryParser so the application can state the > disposition > toward certain fields? overriding QueryParser.getRangeQuery() seems the most powerful and flexible (and it's already there). -Yonik http://www.lucidimagin

Re: Lucene 2.9

2009-03-09 Thread mark harwood
e.apache.org Sent: Monday, 9 March, 2009 13:10:32 Subject: Re: Lucene 2.9 Uwe Schindler wrote: >> Or perhaps we should move Trie* into core Lucene, and then build a >> real (ootb) integration with QueryParser. > > The problem is that the query parser does not know if a fiel

RE: Lucene 2.9

2009-03-09 Thread Allahbaksh Mohammedali Asadullah
he.org Subject: Re: Lucene 2.9 Uwe Schindler wrote: >> Or perhaps we should move Trie* into core Lucene, and then build a >> real (ootb) integration with QueryParser. > > The problem is that the query parser does not know if a field is > encoded as > trie or is just a norma

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
Uwe Schindler wrote: Or perhaps we should move Trie* into core Lucene, and then build a real (ootb) integration with QueryParser. The problem is that the query parser does not know if a field is encoded as trie or is just a normal text token. Furthermore, the new trie API does not differe

RE: Lucene 2.9

2009-03-09 Thread Uwe Schindler
> -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, March 09, 2009 12:51 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene 2.9 > > > Uwe Schindler wrote: > > >>> Is there any plans to ha

RE: Lucene 2.9

2009-03-09 Thread Allahbaksh Mohammedali Asadullah
query. Regards, Allahbaksh -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Monday, March 09, 2009 4:26 PM To: java-user@lucene.apache.org Subject: RE: Lucene 2.9 > > Is there any plans to have simpler queries for Numbers and Data? > > With the

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
Uwe Schindler wrote: Is there any plans to have simpler queries for Numbers and Data? With the recent addition of TrieRangeQuery (in 2.9), I think Lucene's range querying is actually very strong, though you'd have to subclass QueryParser and override getRangeQuery to have it create TrieRang

RE: Lucene 2.9

2009-03-09 Thread Uwe Schindler
> > Is there any plans to have simpler queries for Numbers and Data? > > With the recent addition of TrieRangeQuery (in 2.9), I think Lucene's > range querying is actually very strong, though you'd have to subclass > QueryParser and override getRangeQuery to have it create TrieRangeQuery. The add

Re: Lucene 2.9

2009-03-09 Thread Michael McCandless
Allahbaksh Mohammedali Asadullah wrote: When is Lucene 2.9 due? I am eagerly waiting for the new lucene to come. There have been some discussions on java-dev, but there's no clear consensus/date yet. We do have quite a few Jira issues marked as 2.9 at this point, which we need to make p