Hi Sabeer,
I used Lucene 3.3.0 for testing your code. (I doubt that Lucene 4.0 has been
released as version 3.3.0 was released recently in July).
In the second case, due to exact-matching there is no output i.e. there is
no
"transport" (no exact match) , but "transportation" in sourceText. One
c
: Thanks Ian. How would you achieve the logic of the below query using
: BooleanQuery and BooleanClause.occur? How would you achieve the grouping
: effect?
:
: (Marketing AND Smith) OR Davies
The same way the query parser does: that's a BooleanQuery (A) with two
"SHOULD" clauses, the first of w
: So in a business scenario where we have to make a decision based on the
: "accepted" matching of a document (say perform activity A only when a
: document matches more than 50%), we wont be able to rely on the match score
: because the score will change based on our query and some times 80% matc
For what it's worth, I've seen this happen too (using the stock Lucene
3.3 Java APIs), but it requires me to index many millions of
documents, and doesn't start being a really big problem until the
indexes get to be closer to 250GB in size. When they reach around 1TB,
it will take around an hour fo
OK so the problem definitely comes from the slow merging.
I slightly increased the number merge count and thread to avoid the problem
described previously. But as expected, it just delayed it !
results : 75 minutes to index the 33GB xml file, and 150 minutes to finish
the merge after indexer.close
On Wed, Aug 3, 2011 at 1:22 PM, Pierre-Henri Toussaint
wrote:
>> It looks like merging is running too slowly in your environment,
>> relative to indexing; all of your indexing threads are stuck wanting
>> to launch a new merge but there's already the max allowed (1)
>> concurrent merge running an
On Wed, Aug 3, 2011 at 7:22 PM, Pierre-Henri Toussaint
wrote:
> Hello,
>
> First many thanks for getting in touch.
>
>
> Michael McCandless-2 wrote:
>>
>> It looks like merging is running too slowly in your environment,
>> relative to indexing; all of your indexing threads are stuck wanting
>> to
Hello,
First many thanks for getting in touch.
Michael McCandless-2 wrote:
>
> It looks like merging is running too slowly in your environment,
> relative to indexing; all of your indexing threads are stuck wanting
> to launch a new merge but there's already the max allowed (1)
> concurrent mer
On Wed, Aug 3, 2011 at 5:15 PM, Pierre-Henri Toussaint
wrote:
> I tried to switch back to Lucene 3.2.0, same configuration, and I encountered
> the same problem, be at a later stage.
> See http://piratepad.net/ep/pad/export/ro.kMgHIoReJ2w/rev.2?format=txt here
> the thread dump .
>
can you expla
It looks like merging is running too slowly in your environment,
relative to indexing; all of your indexing threads are stuck wanting
to launch a new merge but there's already the max allowed (1)
concurrent merge running and so IW (intentionally) stalls them.
Are you sure you passed 2 for numThrea
Thanks Ian. How would you achieve the logic of the below query using
BooleanQuery and BooleanClause.occur? How would you achieve the grouping
effect?
(Marketing AND Smith) OR Davies
Thanks a lot.
Jim
On 3 August 2011 14:54, Ian Lea wrote:
> I don't think there is an easy way. Brackets are
Hi Uwe,
Thanks for clarifying and the link given by you does have a satisfactory
explanation.
So in a business scenario where we have to make a decision based on the
"accepted" matching of a document (say perform activity A only when a
document matches more than 50%), we wont be able to rely on t
I tried to switch back to Lucene 3.2.0, same configuration, and I encountered
the same problem, be at a later stage.
See http://piratepad.net/ep/pad/export/ro.kMgHIoReJ2w/rev.2?format=txt here
the thread dump .
--
View this message in context:
http://lucene.472066.n3.nabble.com/Thread-locking-w
Hi All,
I'm currently testing the new DocumentsWriterPerThread in Lucene 4.0.0
(latest build). I use the wikipedia full english article dump as a source
for indexing and the ThreadedIndexWriter implementation proposed in LIA to
achieve concurrent indexing.
Indexing performance seems good at the be
I don't think there is an easy way. Brackets are the official way to
do it with the query parser:
http://lucene.apache.org/java/3_3_0/queryparsersyntax.html#Grouping
For anything non-trivial I prefer to build up queries in code using
BooleanQuery. That way it is comparatively easy to build in wh
Hi,
I'm having trouble thinking of a way to effectively group clauses to form
sub queries. For example, I need to handle the following query:
Marketing AND Smith OR Davies.
Lucene is currently parsing this as +Marketing +Smith Davies meaning that
results where only the term Davies is found are
16 matches
Mail list logo