Is anyone using SOLR in Australia?

2010-03-24 Thread Andrew Bruno
Hi all, I was wondering if anyone is using SOLR successfully in Australia in a high end high transaction system? Cheers Andrew - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: ja

Filters and multiple, per-segment calls to getDocIdSet

2010-03-24 Thread Daniel Noll
Hi all. I notice that Filter.getDocIdSet() is now documented as follows: Note: This method will be called once per segment in the index during searching. The returned {...@link DocIdSet} must refer to document IDs for that segment, not for the top-level reader. If I look at Luce

Fwd: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-24 Thread Yonik Seeley
Forwarding to lucene only - the big cross-post caused my gmail filters to "file" it. -Yonik -- Forwarded message -- From: Grant Ingersoll Date: Wed, Mar 24, 2010 at 8:03 PM Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010 To: Lucene m

Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-24 Thread Grant Ingersoll
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 & 21, 2010 All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 PM US EDT The first European conference dedicated to Lucene and Solr is coming to Prague from May 18-21, 2010. Apache Lucene E

Seattle Hadoop/Scalability/NoSQL Meetup Wednesday, March 31st. w/ LinkedIn's Jake Mannix

2010-03-24 Thread Bradford Stephens
Greetings, Don't forget that the Hadoop/Scalability/NoSQL meetup is next Wednesday, March 31st at 6:45pm! We're going to have a very exciting guest: Jake Mannix from LinkedIn will talk about machine learning on Hadoop. He's a well-decorated engineer across many disciplines, and even knows quite a

Re: Garbage Collection performance on 2.9.2

2010-03-24 Thread Michael McCandless
Is this during indexing or searching? Mike On Wed, Mar 24, 2010 at 3:45 PM, Grant Ingersoll wrote: > > On Mar 24, 2010, at 2:13 PM, Siraj Haider wrote: > >> We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance >> deteriorated drastically.  The system is going to Full GC cycles

RE: Fields with the same name

2010-03-24 Thread Murdoch, Paul
It was an unexpected coincidence that the two cases ended up with the same field name. I just changed the one case to index with a different field name and that fixed my problem. I was still curious though. Thanks, Paul -Original Message- From: java-user-return-45558-paul.b.murdoch=s

Re: Fields with the same name

2010-03-24 Thread Erick Erickson
I don't think so, but a quick way to check would be to look at your index with a copy of Luke and see what the actual tokens are. But I'm not sure it matters, I don't think you *can* make things work out well; your query-time analysis will be...er...difficult. You only get to specify one analyzer

Re: Garbage Collection performance on 2.9.2

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 2:13 PM, Siraj Haider wrote: > We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance > deteriorated drastically. The system is going to Full GC cycles with long > pauses very frequently. Did something got changed that we need to account > for? Yes, quite

Custom Filter

2010-03-24 Thread Siraj Haider
Hello there, I am getting exception when running queries with new getDocIdSet() in my customer filter. Following is the code for my getDocIdSet() function: /public DocIdSet getDocIdSet(IndexReader reader) throws IOException { OpenBitSet bitSet = new OpenBitSet(reader.maxDoc()); for (in

Fields with the same name

2010-03-24 Thread Murdoch, Paul
Hi, I have a quick question. If I have an index where some text values are indexed under the same field name, but some are ANALYZED and some are NOT_ANALYZED, does the last value's flags change the flags for the whole field name? For instance if I index 3 sentences under a field name as ANALYZ

Garbage Collection performance on 2.9.2

2010-03-24 Thread Siraj Haider
We upgraded to 2.9.2 from 2.3.2 and the garbage collection performance deteriorated drastically. The system is going to Full GC cycles with long pauses very frequently. Did something got changed that we need to account for? thanks in advance -siraj --

Re: Lucene query with long strings

2010-03-24 Thread Grant Ingersoll
On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote: > Add the common terms such as "University", "School", "Medicine", > "Institute" etc. to stopwords list, so you are left with Stanford, > "Palo Alto" etc. I don't know if I would remove them, but you might consider using the CommonGram or n-gram a

Re: Lucene query with long strings

2010-03-24 Thread Shashi Kant
Add the common terms such as "University", "School", "Medicine", "Institute" etc. to stopwords list, so you are left with Stanford, "Palo Alto" etc. Then use Ahmet's suggestion of using a booleanquery .setMinimumNumberShouldMatch() to (say) 75% of the query string length. Finally, if you wish to