Re: Concurrent query benchmarks

2008-06-09 Thread Doron Cohen
On Tue, Jun 10, 2008 at 3:50 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi Glen, > > Thanks for sharing. Does your benchmarking tool build on top of > contrib/benchmark? (not sure if that one lets you specify the number of > concurrent threads -- if it does not, perhaps this is an opportu

Re: Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
Andrzej Bialecki wrote: I have a thought ;) Perhaps you could use a FilteredIndexReader to maintain a map between new IDs and old IDs, and remap on the fly. Although I think that some parts of Lucene depend on the fact that in a normal index the IDs are monotonically increasing ... this would co

Re: Concurrent query benchmarks

2008-06-09 Thread Otis Gospodnetic
Hi Glen, Thanks for sharing. Does your benchmarking tool build on top of contrib/benchmark? (not sure if that one lets you specify the number of concurrent threads -- if it does not, perhaps this is an opportunity to add this functionality). I couldn't find info about the index format (compou

Re: How international languages are supported in Lucene

2008-06-09 Thread Otis Gospodnetic
Aha, I see. I wasn't referring to character-encoding-based lang ID. That is probably good enough if you need to know if the text is English or if it's Chinese or Japanese or Korean or Russian Cyrillic or Arabic or... I think there is a bit missing in your statement about training. You can't

Re: How international languages are supported in Lucene

2008-06-09 Thread Daniel Noll
On Tuesday 10 June 2008 07:49:29 Otis Gospodnetic wrote: > Hi Daniel, > > What makes you say that about language detection? Wouldn't that depend on > the language detection approach or tool one uses and on the type and amount > of content one trains language detector on? And what is the threshold

Concurrent query benchmarks

2008-06-09 Thread Glen Newton
A number of people have asked about query benchmarks. I have posted benchmarks for concurrent query requests for Lucene 2.3.1 on my blog, where I look at 1 - 4096 concurrent requests: http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html I hope you find this useful. thanks

Re: How international languages are supported in Lucene

2008-06-09 Thread Otis Gospodnetic
Hi Daniel, What makes you say that about language detection? Wouldn't that depend on the language detection approach or tool one uses and on the type and amount of content one trains language detector on? And what is the threshold for "reliable enough" that you have in mind? Thanks, Otis --

Re: How international languages are supported in Lucene

2008-06-09 Thread Otis Gospodnetic
Hi Daniel, What makes you say that about language detection? Wouldn't that depend on the language detection approach or tool one uses and on the type and amount of content one trains language detector on? And what is the threshold for "reliable enough" that you have in mind? Thanks, Otis --

Re: Running Lucene in a Clustered Environment

2008-06-09 Thread Otis Gospodnetic
Kalani, Is creating a local index and then distributing it to all nodes in the cluster an option for you? Or maybe you can simply put your index on a SAN and let all search nodes access the same index there? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Mes

Re: Compass - Reloading Domain Object Defintiion Files

2008-06-09 Thread Otis Gospodnetic
Hi, It looks like you wanted to post this to Compass mailing list, but posted it to the Lucene user list. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: stevieray <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, June

Compass - Reloading Domain Object Defintiion Files

2008-06-09 Thread stevieray
I am interested in changing the search configuration for grails domain objects on a running grails instance. Near as I can tell, I need to reload the mappings (Domain.cpm.xml) and then reindex lucene. Is this the correct approach? If so, how would I go about reloading the mapping files? What

Options of Field constructor accepting Reader

2008-06-09 Thread Itamar Syn-Hershko
Hi all, I was wondering why only the Field constructor which accepts a String offers Store and Index options? I understand there might be no logic in offering them for the TokenStream constructor, but what's wrong in Storing an input from a Reader, that 2.3.2 does not allow it? Itamar.

Re: Rebuilding parallel indexes

2008-06-09 Thread Andrzej Bialecki
Antony Bowesman wrote: I have a design where I will be using multiple index shards to hold approx 7.5 million documents per index per month over many years. These will be large static R/O indexes but the corresponding smaller parallel index will get many frequent changes. I understand from p

number of hits per document

2008-06-09 Thread John Byrne
Hi, Is there an easy way to find out the number of hits per document for a Query, rather than just for a Term? Let's say, for example, I have a document like this: "here is cats near dogs and here is cats a long long way from dogs" and I use a SpanNearQuery to find "cats" near "dogs" with a

Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
I have a design where I will be using multiple index shards to hold approx 7.5 million documents per index per month over many years. These will be large static R/O indexes but the corresponding smaller parallel index will get many frequent changes. I understand from previous replies by Hoss

Running Lucene in a Clustered Environment

2008-06-09 Thread Kalani Ruwanpathirana
Hi all, I'm new to Lucene. I need to run Lucene in a clustered environment. So creating the index in the local file system is not an option and it is better if I can create the index in the database as all nodes can share it. Can anyone of you please suggest me a way to do this? I got to know abo

Re: Indexstructure design

2008-06-09 Thread Anshum
Hi, Considering you have that number of documents for each class, you may think of splitting the index (as I believe that the total number would be high). What exactly would you mean by 'get the index' from the result? Do you mean that you would want to fetch the class as well (without actually fet

Re: Indexstructure design

2008-06-09 Thread Sascha Fahl
Hi and thanks a lot, I'll take a look at the HitCollector thing. I think I will have around 500.000 - 1.000.000 docs per class. So having different indeces is a good idea I think. Especially because half of the requests will point to only one document class and not to all classes. Is there a wa