Hi Glen,
As far as stats for index/search are concerned, here they are:
* Yes, it is a web based application
* I am currently facing issues when the number of concurrent searches goes
high. The search is not able to handle over 2.5 searches per second.
* JVM command line parameters: -server mode;
Smart idea, but it won't help me. I have almost 50 categories and eventually
I would like to "filter" not just on category but maybe also on language,
etc.
Karl: what do you mean by measure the distance between the term vectors and
cluster them in real time?
On Tue, Apr 22, 2008 at 7:39 PM, Glen N
Sorry, I misunderstood the problem. My mistake.
While not optimal and rather expensive space-wise, you could have - in
addition to existing keyword field - a field for each category. If
the document being indexed is in category A, only add the text to the
catA field. Now do MoreLikeThis on catA.
I could have up to 2 million documents and growing.
On Tue, Apr 22, 2008 at 7:29 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
> Jonathan Ariel skrev:
>
> Is there any way to execute a MoreLikeThis over a subset of documents? I
> > need to retrieve a set of interesting keywords from a subset of
> >
Jonathan Ariel skrev:
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting keywords from a subset of documents
and not the entire index (imagine that my index has documents categorized as
A, B and C and I just want to work with those categ
But that doesn't help me with my problem, because the interesting terms are
taken from the entire index and not a subset as I need.
On Tue, Apr 22, 2008 at 6:46 PM, Glen Newton <[EMAIL PROTECTED]> wrote:
> Instead of this:
>
> MoreLikeThis mlt = new MoreLikeThis(ir);
> Reader target = ... // orig
That's an excellent idea. I would certainely use such an improved
MultiSearcher.
You should submit a patch.
-Original Message-
From: Glen Newton [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 22, 2008 10:50 AM
To: java-user@lucene.apache.org
Subject: Re: Binding lucene instance/threads
Instead of this:
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig source of doc you want to find similarities to
Query query = mlt.like( target);
Hits hits = is.search(query);
do this:
MoreLikeThis mlt = new MoreLikeThis(ir);
Reader target = ... // orig source of doc you want
Is there any way to execute a MoreLikeThis over a subset of documents? I
need to retrieve a set of interesting keywords from a subset of documents
and not the entire index (imagine that my index has documents categorized as
A, B and C and I just want to work with those categorized as A). Right now
: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines
i didn't do an indepth code read, but a quick skim of
StandardTokenizerImpl didn't turn up a
Hi Prashant,
What is the Unicode code point associated with the 3,4,5 character?
Steve
On 04/22/2008 at 4:45 PM, Prashant Malik wrote:
> Yes the version of lucene and java are exactly the same on
> the different
> machines.
> Infact we unjared lucene and jared it with our jar and are
> running f
Yes the version of lucene and java are exactly the same on the different
machines.
Infact we unjared lucene and jared it with our jar and are running from the
same nfs mounts on both the machines
Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
also about the actual string u
Hi Prashant,
On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
> We have been observing the following problem while
> tokenizing using lucene's StandardAnalyzer. Tokens that we get is
> different on different machines. I am suspecting it has something to do
> with the Locale settings on individu
HI ,
We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?
For example
the word 'CÃ(c)sar' is split as 'CÃ
So even if you only have one index, this is the way to go to manage
this kind of problem.
Looking at the implementation and having used ThreadPoolExecutor (TPE)
a lot, I would make the following suggestions for this class so as to
better support this particular use case:
Better access to the confi
> one solution is to set-up a ThreadPoolExecutor[2] with a fixed
> number of threads and a limited queue size (use a bound BlockingQueue[3])
Yes, this is precisely how the ConcurrentMultiSearcher works.
https://issues.apache.org/jira/browse/LUCENE-423
-Original Message-
From: Glen New
Hey Mike,
Thank you very much for looking into this issue!
I originally switched to the SerialMergeScheduler to try and work around this
bug: http://lucene.markmail.org/message/awkkunr7j24nh4qj . I switched back to
the ConcurrentMergeScheduler yesterday (since I would rather fail quickly due
t
Anshum:
Have you looked into the ConcurrentMultiSearcher? It would have you split
your index into N sub-indices, and search each with a dedicated thread.
--Renaud
-Original Message-
From: Anshum [mailto:[EMAIL PROTECTED]
Sent: Monday, April 21, 2008 9:10 PM
To: java-user@lucene.apache
Anshun,
I think I am dealing with an index of similar scale: 6.4 million
records, 83 GB index (see [1] for more info)
I mistakenly thought from your original posting that you were
interested in binding threads to processors for indexing, but it is
sounding like you want to do this for searching.
The hang also only happens if you are using SerialMergeScheduler.
Stu, one question: was there an interesting reason why you switched
back to SerialMergeScheduler? Did you hit an issue with
ConcurrentMergeScheduler?
Mike
Stu Hood <[EMAIL PROTECTED]> wrote:
> Hey gang,
>
> The finally block was
Hmmm, sounds like you need a new Query. I _think_ it could be
something as simple as MutliplicativeTermQuery or something like that
whereby instead of adding the score of the payload callback, you would
multiple. That way, if the document with the term does not have the
payload of intere
OK this output was very helpful, thanks! I think I see what's
happening here.
Basically a merge can sneak in when Lucene doesn't expect it to (on
copying a single external segment over), and as a result it never gets
scheduled. This happens only with addIndexesNoOptimize, when the
index you addi
Hi Karl,
Thanks for the suggestions, i would be glad to contribute back to the project.
i'm not too familiar with the inner workings of Lucene though; how does such a
functionality feature in a Query implementation?
My naive interpretation, when i first got hold of Lucene, is that Query is wha
I can think of two ways to get your hands on this information, simplest
one beeing you creating a filter with the documents that mached your
original query and then place new queries on the index with slop, non
slop, et c to find out whats what. This will of couse be very expensive
and is thus onl
In that case you may want to index each:
Field("Sub","下午去开会","01:02:02");
as a separate document. So your document contains 3 fields
1. title
2. time
3. sub
then you can get both title and time by searching the "sub" field.
Cedric
2008/4/22 王建新 <[EMAIL PROTECTED]>:
>
> 谢谢,我只是检索sub,不检索时间,在检索s
25 matches
Mail list logo