Re: highlighting

2011-08-03 Thread govind bhardwaj
Hi Sabeer, I used Lucene 3.3.0 for testing your code. (I doubt that Lucene 4.0 has been released as version 3.3.0 was released recently in July). In the second case, due to exact-matching there is no output i.e. there is no "transport" (no exact match) , but "transportation" in sourceText. One c

Re: Grouping Clauses to Preserve Order of Boolean Precedence

2011-08-03 Thread Chris Hostetter
: Thanks Ian. How would you achieve the logic of the below query using : BooleanQuery and BooleanClause.occur? How would you achieve the grouping : effect? : : (Marketing AND Smith) OR Davies The same way the query parser does: that's a BooleanQuery (A) with two "SHOULD" clauses, the first of w

Re: Multiple Query clauses impacting result

2011-08-03 Thread Chris Hostetter
: So in a business scenario where we have to make a decision based on the : "accepted" matching of a document (say perform activity A only when a : document matches more than 50%), we wont be able to rely on the match score : because the score will change based on our query and some times 80% matc

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Devon H. O'Dell
For what it's worth, I've seen this happen too (using the stock Lucene 3.3 Java APIs), but it requires me to index many millions of documents, and doesn't start being a really big problem until the indexes get to be closer to 250GB in size. When they reach around 1TB, it will take around an hour fo

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Pierre-Henri Toussaint
OK so the problem definitely comes from the slow merging. I slightly increased the number merge count and thread to avoid the problem described previously. But as expected, it just delayed it ! results : 75 minutes to index the 33GB xml file, and 150 minutes to finish the merge after indexer.close

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Michael McCandless
On Wed, Aug 3, 2011 at 1:22 PM, Pierre-Henri Toussaint wrote: >> It looks like merging is running too slowly in your environment, >> relative to indexing; all of your indexing threads are stuck wanting >> to launch a new merge but there's already the max allowed (1) >> concurrent merge running an

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Simon Willnauer
On Wed, Aug 3, 2011 at 7:22 PM, Pierre-Henri Toussaint wrote: > Hello, > > First many thanks for getting in touch. > > > Michael McCandless-2 wrote: >> >> It looks like merging is running too slowly in your environment, >> relative to indexing; all of your indexing threads are stuck wanting >> to

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Pierre-Henri Toussaint
Hello, First many thanks for getting in touch. Michael McCandless-2 wrote: > > It looks like merging is running too slowly in your environment, > relative to indexing; all of your indexing threads are stuck wanting > to launch a new merge but there's already the max allowed (1) > concurrent mer

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Simon Willnauer
On Wed, Aug 3, 2011 at 5:15 PM, Pierre-Henri Toussaint wrote: > I tried to switch back to Lucene 3.2.0, same configuration, and I encountered > the same problem, be at a later stage. > See  http://piratepad.net/ep/pad/export/ro.kMgHIoReJ2w/rev.2?format=txt here > the thread dump . > can you expla

Re: Thread locking and extreme low perfs while merging (ConcurrentMergeScheduler issue ?

2011-08-03 Thread Michael McCandless
It looks like merging is running too slowly in your environment, relative to indexing; all of your indexing threads are stuck wanting to launch a new merge but there's already the max allowed (1) concurrent merge running and so IW (intentionally) stalls them. Are you sure you passed 2 for numThrea

Re: Grouping Clauses to Preserve Order of Boolean Precedence

2011-08-03 Thread Jim Swainston
Thanks Ian. How would you achieve the logic of the below query using BooleanQuery and BooleanClause.occur? How would you achieve the grouping effect? (Marketing AND Smith) OR Davies Thanks a lot. Jim On 3 August 2011 14:54, Ian Lea wrote: > I don't think there is an easy way. Brackets are

Re: Multiple Query clauses impacting result

2011-08-03 Thread Saurabh Gokhale
Hi Uwe, Thanks for clarifying and the link given by you does have a satisfactory explanation. So in a business scenario where we have to make a decision based on the "accepted" matching of a document (say perform activity A only when a document matches more than 50%), we wont be able to rely on t

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Pierre-Henri Toussaint
I tried to switch back to Lucene 3.2.0, same configuration, and I encountered the same problem, be at a later stage. See http://piratepad.net/ep/pad/export/ro.kMgHIoReJ2w/rev.2?format=txt here the thread dump . -- View this message in context: http://lucene.472066.n3.nabble.com/Thread-locking-w

Thread locking and extreme low perfs while merging (ConcurrentMergeScheduler issue ?

2011-08-03 Thread Pierre-Henri Toussaint
Hi All, I'm currently testing the new DocumentsWriterPerThread in Lucene 4.0.0 (latest build). I use the wikipedia full english article dump as a source for indexing and the ThreadedIndexWriter implementation proposed in LIA to achieve concurrent indexing. Indexing performance seems good at the be

Re: Grouping Clauses to Preserve Order of Boolean Precedence

2011-08-03 Thread Ian Lea
I don't think there is an easy way. Brackets are the official way to do it with the query parser: http://lucene.apache.org/java/3_3_0/queryparsersyntax.html#Grouping For anything non-trivial I prefer to build up queries in code using BooleanQuery. That way it is comparatively easy to build in wh

Grouping Clauses to Preserve Order of Boolean Precedence

2011-08-03 Thread Jim Swainston
Hi, I'm having trouble thinking of a way to effectively group clauses to form sub queries. For example, I need to handle the following query: Marketing AND Smith OR Davies. Lucene is currently parsing this as +Marketing +Smith Davies meaning that results where only the term Davies is found are