On Tue, Jun 10, 2008 at 3:50 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:
> Hi Glen,
>
> Thanks for sharing. Does your benchmarking tool build on top of
> contrib/benchmark? (not sure if that one lets you specify the number of
> concurrent threads -- if it does not, perhaps this is an opportu
Andrzej Bialecki wrote:
I have a thought ;) Perhaps you could use a FilteredIndexReader to maintain a
map between new IDs and old IDs, and remap on the fly. Although I think that
some parts of Lucene depend on the fact that in a normal index the IDs are
monotonically increasing ... this would co
Hi Glen,
Thanks for sharing. Does your benchmarking tool build on top of
contrib/benchmark? (not sure if that one lets you specify the number of
concurrent threads -- if it does not, perhaps this is an opportunity to add
this functionality).
I couldn't find info about the index format (compou
Aha, I see. I wasn't referring to character-encoding-based lang ID. That is
probably good enough if you need to know if the text is English or if it's
Chinese or Japanese or Korean or Russian Cyrillic or Arabic or...
I think there is a bit missing in your statement about training. You can't
On Tuesday 10 June 2008 07:49:29 Otis Gospodnetic wrote:
> Hi Daniel,
>
> What makes you say that about language detection? Wouldn't that depend on
> the language detection approach or tool one uses and on the type and amount
> of content one trains language detector on? And what is the threshold
A number of people have asked about query benchmarks.
I have posted benchmarks for concurrent query requests for Lucene
2.3.1 on my blog, where I look at 1 - 4096 concurrent requests:
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
I hope you find this useful.
thanks
Hi Daniel,
What makes you say that about language detection? Wouldn't that depend on the
language detection approach or tool one uses and on the type and amount of
content one trains language detector on? And what is the threshold for
"reliable enough" that you have in mind?
Thanks,
Otis --
Hi Daniel,
What makes you say that about language detection? Wouldn't that depend on the
language detection approach or tool one uses and on the type and amount of
content one trains language detector on? And what is the threshold for
"reliable enough" that you have in mind?
Thanks,
Otis --
Kalani,
Is creating a local index and then distributing it to all nodes in the cluster
an option for you?
Or maybe you can simply put your index on a SAN and let all search nodes access
the same index there?
Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Mes
Hi,
It looks like you wanted to post this to Compass mailing list, but posted it to
the Lucene user list.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: stevieray <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, June
I am interested in changing the search configuration for grails domain
objects on a running grails instance. Near as I can tell, I need to reload
the mappings (Domain.cpm.xml) and then reindex lucene. Is this the correct
approach? If so, how would I go about reloading the mapping files? What
Hi all,
I was wondering why only the Field constructor which accepts a String offers
Store and Index options? I understand there might be no logic in offering
them for the TokenStream constructor, but what's wrong in Storing an input
from a Reader, that 2.3.2 does not allow it?
Itamar.
Antony Bowesman wrote:
I have a design where I will be using multiple index shards to hold
approx 7.5 million documents per index per month over many years. These
will be large static R/O indexes but the corresponding smaller parallel
index will get many frequent changes.
I understand from p
Hi,
Is there an easy way to find out the number of hits per document for a
Query, rather than just for a Term?
Let's say, for example, I have a document like this:
"here is cats near dogs and here is cats a long long way from dogs"
and I use a SpanNearQuery to find "cats" near "dogs" with a
I have a design where I will be using multiple index shards to hold approx 7.5
million documents per index per month over many years. These will be large
static R/O indexes but the corresponding smaller parallel index will get many
frequent changes.
I understand from previous replies by Hoss
Hi all,
I'm new to Lucene. I need to run Lucene in a clustered environment. So
creating the index in the local file system is not an option and it is
better if I can create the index in the database as all nodes can share it.
Can anyone of you please suggest me a way to do this? I got to know abo
Hi,
Considering you have that number of documents for each class, you may think
of splitting the index (as I believe that the total number would be high).
What exactly would you mean by 'get the index' from the result? Do you mean
that you would want to fetch the class as well (without actually fet
Hi and thanks a lot,
I'll take a look at the HitCollector thing.
I think I will have around 500.000 - 1.000.000 docs per class.
So having different indeces is a good idea I think. Especially because
half of the requests will point to only one document class and not to
all classes.
Is there a wa
18 matches
Mail list logo