[ANNOUNCE] Apache Lucene 3.3

2011-06-30 Thread Robert Muir
July 2011, Apache Lucene™ 3.3 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.3. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text sea

Too many open files and ulimit limits reached

2011-06-30 Thread Hiller, Dean x66079
When I do a writer.open(), writer.add(), writer.close(), how many files can I expect to be opened with Lucene. I am running indexes on some very big data so we have 16 writers open and I hit the limit of 20 on my machine so I increased it to the max of 1048576 files open, BUT that might

RE: field sorted searches with unbounded hit count

2011-06-30 Thread Tim Eck
Thanks for the confirmation Mike, two pass search it is. I appreciate the knowledge on this list very much! -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Sunday, June 26, 2011 6:00 AM To: java-user@lucene.apache.org Subject: Re: field sorted searche

RE: distributing the indexing process

2011-06-30 Thread Toke Eskildsen
On Thu, 2011-06-30 at 11:45 +0200, Guru Chandar wrote: > Thanks for the response. The documents are all distinct. My (limited) > understanding on partitioning the indexes will lead to results being > different from the case where you have all in one partition, due to > Lucene currently not supp

Re: distributing the indexing process

2011-06-30 Thread Sanne Grinovero
Hello, you could have each node build a separate index, and then merge the result back in a single consistent index using org.apache.lucene.index.IndexWriter.addIndexes(Directory...) Regards, Sanne 2011/6/30 Guru Chandar : > Thanks for the response. The documents are all distinct. My (limited)

RE: distributing the indexing process

2011-06-30 Thread Guru Chandar
Thanks for the response. The documents are all distinct. My (limited) understanding on partitioning the indexes will lead to results being different from the case where you have all in one partition, due to Lucene currently not supporting distributed idf. Is this correct? Is there a way to make

Re: distributing the indexing process

2011-06-30 Thread Danil ŢORIN
It depends If all documents are distinct then, yeah, go for it. If you have multiple versions of same document in your data and you only want to index the latest version...then you need a clever way to split data to make sure that all versions of document will be indexed on same host, and you

distributing the indexing process

2011-06-30 Thread Guru Chandar
If we have to index a lot of documents, is there a way to divide the documents into multiple sets and index them on multiple machines in parallel, and then merge the resulting indexes back into a single machine? If yes, will the result be logically equivalent to indexing all the documents on a s

AW: negative wildcard query

2011-06-30 Thread Clemens Wyss
Thx! > -Ursprüngliche Nachricht- > Von: Uwe Schindler [mailto:u...@thetaphi.de] > Gesendet: Donnerstag, 30. Juni 2011 10:32 > An: java-user@lucene.apache.org > Betreff: RE: negative wildcard query > > Pure negative queries do not work, you have to add a MUST clause that hits > all documen

RE: negative wildcard query

2011-06-30 Thread Uwe Schindler
Pure negative queries do not work, you have to add a MUST clause that hits all documents, e.g. MatchAllDocsQuery: query = new BooleanQuery(); query.add(new MatchAllDocsQuery(), Occur.MUST) query.add(new WildcardQuery(new Term( "f", "*test*" )), Occur.MUST_NOT ); Uwe - Uwe Schindler H.-H.-Me

AW: negative wildcard query

2011-06-30 Thread Clemens Wyss
My testcase/context: query = new BooleanQuery(); query.add( new WildcardQuery( new Term( "f", "*test*" ) ), Occur.MUST_NOT ); filter = new QueryWrapperFilter( query ); result = indexSearcher.search( new WildcardQuery( new Term( "description", "*happy*" ) ), filter, 10 ); The filter never ever le