date:20080422

Re: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Anshum

Hi Glen, As far as stats for index/search are concerned, here they are: * Yes, it is a web based application * I am currently facing issues when the number of concurrent searches goes high. The search is not able to handle over 2.5 searches per second. * JVM command line parameters: -server mode;

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

Smart idea, but it won't help me. I have almost 50 categories and eventually I would like to "filter" not just on category but maybe also on language, etc. Karl: what do you mean by measure the distance between the term vectors and cluster them in real time? On Tue, Apr 22, 2008 at 7:39 PM, Glen N

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton

Sorry, I misunderstood the problem. My mistake. While not optimal and rather expensive space-wise, you could have - in addition to existing keyword field - a field for each category. If the document being indexed is in category A, only add the text to the catA field. Now do MoreLikeThis on catA.

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

I could have up to 2 million documents and growing. On Tue, Apr 22, 2008 at 7:29 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > Jonathan Ariel skrev: > > Is there any way to execute a MoreLikeThis over a subset of documents? I > > need to retrieve a set of interesting keywords from a subset of > >

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Karl Wettin

Jonathan Ariel skrev: Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those categ

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

But that doesn't help me with my problem, because the interesting terms are taken from the entire index and not a subset as I need. On Tue, Apr 22, 2008 at 6:46 PM, Glen Newton <[EMAIL PROTECTED]> wrote: > Instead of this: > > MoreLikeThis mlt = new MoreLikeThis(ir); > Reader target = ... // orig

RE: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Renaud Waldura

That's an excellent idea. I would certainely use such an improved MultiSearcher. You should submit a patch. -Original Message- From: Glen Newton [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 10:50 AM To: java-user@lucene.apache.org Subject: Re: Binding lucene instance/threads

Re: MoreLikeThis over a subset of documents

2008-04-22 Thread Glen Newton

Instead of this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you want to find similarities to Query query = mlt.like( target); Hits hits = is.search(query); do this: MoreLikeThis mlt = new MoreLikeThis(ir); Reader target = ... // orig source of doc you want

MoreLikeThis over a subset of documents

2008-04-22 Thread Jonathan Ariel

Is there any way to execute a MoreLikeThis over a subset of documents? I need to retrieve a set of interesting keywords from a subset of documents and not the entire index (imagine that my index has documents categorized as A, B and C and I just want to work with those categorized as A). Right now

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Chris Hostetter

: Yes the version of lucene and java are exactly the same on the different : machines. : Infact we unjared lucene and jared it with our jar and are running from the : same nfs mounts on both the machines i didn't do an indepth code read, but a quick skim of StandardTokenizerImpl didn't turn up a

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe

Hi Prashant, What is the Unicode code point associated with the 3,4,5 character? Steve On 04/22/2008 at 4:45 PM, Prashant Malik wrote: > Yes the version of lucene and java are exactly the same on > the different > machines. > Infact we unjared lucene and jared it with our jar and are > running f

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik

Yes the version of lucene and java are exactly the same on the different machines. Infact we unjared lucene and jared it with our jar and are running from the same nfs mounts on both the machines Also we have tried with lucene2.2.0 and 2.3.1. with the same result . also about the actual string u

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe

Hi Prashant, On 04/22/2008 at 2:23 PM, Prashant Malik wrote: > We have been observing the following problem while > tokenizing using lucene's StandardAnalyzer. Tokens that we get is > different on different machines. I am suspecting it has something to do > with the Locale settings on individu

Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik

HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines? For example the word 'CÃ(c)sar' is split as 'CÃ

Re: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Glen Newton

So even if you only have one index, this is the way to go to manage this kind of problem. Looking at the implementation and having used ThreadPoolExecutor (TPE) a lot, I would make the following suggestions for this class so as to better support this particular use case: Better access to the confi

RE: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Renaud Waldura

> one solution is to set-up a ThreadPoolExecutor[2] with a fixed > number of threads and a limited queue size (use a bound BlockingQueue[3]) Yes, this is precisely how the ConcurrentMultiSearcher works. https://issues.apache.org/jira/browse/LUCENE-423 -Original Message- From: Glen New

Re: FW: Re: Occasional Hang in IndexWriter.close()

2008-04-22 Thread Stu Hood

Hey Mike, Thank you very much for looking into this issue! I originally switched to the SerialMergeScheduler to try and work around this bug: http://lucene.markmail.org/message/awkkunr7j24nh4qj . I switched back to the ConcurrentMergeScheduler yesterday (since I would rather fail quickly due t

RE: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Renaud Waldura

Anshum: Have you looked into the ConcurrentMultiSearcher? It would have you split your index into N sub-indices, and search each with a dedicated thread. --Renaud -Original Message- From: Anshum [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 9:10 PM To: java-user@lucene.apache

Re: Binding lucene instance/threads to a particular processor(or core)

2008-04-22 Thread Glen Newton

Anshun, I think I am dealing with an index of similar scale: 6.4 million records, 83 GB index (see [1] for more info) I mistakenly thought from your original posting that you were interested in binding threads to processors for indexing, but it is sounding like you want to do this for searching.

Re: FW: Re: Occasional Hang in IndexWriter.close()

2008-04-22 Thread Michael McCandless

The hang also only happens if you are using SerialMergeScheduler. Stu, one question: was there an interesting reason why you switched back to SerialMergeScheduler? Did you hit an issue with ConcurrentMergeScheduler? Mike Stu Hood <[EMAIL PROTECTED]> wrote: > Hey gang, > > The finally block was

Re: how to query against payload

2008-04-22 Thread Grant Ingersoll

Hmmm, sounds like you need a new Query. I _think_ it could be something as simple as MutliplicativeTermQuery or something like that whereby instead of adding the score of the payload callback, you would multiple. That way, if the document with the term does not have the payload of intere

Re: FW: Re: Occasional Hang in IndexWriter.close()

2008-04-22 Thread Michael McCandless

OK this output was very helpful, thanks! I think I see what's happening here. Basically a merge can sneak in when Lucene doesn't expect it to (on copying a single external segment over), and as a result it never gets scheduled. This happens only with addIndexesNoOptimize, when the index you addi

RE: How to Retrieve Found Term?

2008-04-22 Thread Edwin Lee

Hi Karl, Thanks for the suggestions, i would be glad to contribute back to the project. i'm not too familiar with the inner workings of Lucene though; how does such a functionality feature in a Query implementation? My naive interpretation, when i first got hold of Lucene, is that Query is wha

Re: How to Retrieve Found Term?

2008-04-22 Thread Karl Wettin

I can think of two ways to get your hands on this information, simplest one beeing you creating a filter with the documents that mached your original query and then place new queries on the index with slop, non slop, et c to find out whats what. This will of couse be very expensive and is thus onl

Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）

2008-04-22 Thread Cedric Ho

In that case you may want to index each: Field("Sub","下午去开会"，"01:02:02"); as a separate document. So your document contains 3 fields 1. title 2. time 3. sub then you can get both title and time by searching the "sub" field. Cedric 2008/4/22 王建新 <[EMAIL PROTECTED]>: > > 谢谢，我只是检索sub，不检索时间，在检索s

Re: Binding lucene instance/threads to a particular processor(or core)

Re: MoreLikeThis over a subset of documents

Re: MoreLikeThis over a subset of documents

Re: MoreLikeThis over a subset of documents

Re: MoreLikeThis over a subset of documents

Re: MoreLikeThis over a subset of documents

RE: Binding lucene instance/threads to a particular processor(or core)

Re: MoreLikeThis over a subset of documents

MoreLikeThis over a subset of documents

Re: Lucene standard analyzer internationalization

RE: Lucene standard analyzer internationalization

Re: Lucene standard analyzer internationalization

RE: Lucene standard analyzer internationalization

Lucene standard analyzer internationalization

Re: Binding lucene instance/threads to a particular processor(or core)

RE: Binding lucene instance/threads to a particular processor(or core)

Re: FW: Re: Occasional Hang in IndexWriter.close()

RE: Binding lucene instance/threads to a particular processor(or core)

Re: Binding lucene instance/threads to a particular processor(or core)

Re: FW: Re: Occasional Hang in IndexWriter.close()

Re: how to query against payload

Re: FW: Re: Occasional Hang in IndexWriter.close()

RE: How to Retrieve Found Term?

Re: How to Retrieve Found Term?

Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）

25 matches

Site Navigation

Mail list logo

Footer information