If essentially all you need is key-value storage, Berkeley DB for Java works
well. Lookup by ID is fast, can iterate through documents, supports
secondary keys, updates, etc.
Lucene would work relatively well for this, although inserting documents
might not be as fast, because segments need to be
Sounds like you might have an I/O issue. If you have multiple partitions /
disks on the searching server you can search from one partition and copy to
another and alternate. If you're using RAID different RAID levels are
optimized for simultaneous reads and writes.
If you have a 3rd machine you
You can try using MemoryCachedRangeFilter.
https://issues.apache.org/jira/browse/LUCENE-855
It stores field values in memory as longs so your values don't have to be
lexigraphically comparable. Also, MemoryCachedRangeFilter can be orders of
magnitude faster than standard RangeFilter, depending
so far is how
MultiSearcher handles custom Similarity (see
https://issues.apache.org/jira/browse/LUCENE-789).
Hope this helps,
Alexey
-----Original Message-
From: Andy Liu [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 06, 2007 3:34 PM
To: java-user@lucene.apache.org
Subject: Using ParallelReader
Is there a working solution out there that would let me use ParallelReader
to search over a large, immutable index and a smaller, auxillary index that
is updated frequently? Currently, from my understanding, the ParallelReader
fails when one of the indexes is updated because the document ID's get
I'm just not seeing?
Andy
On 9/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Monday 18 September 2006 23:08, Andy Liu wrote:
> For multi-word queries, I would like to reward documents that contain a
more
> even distribution of each word and penalize documents that have a skewe
For multi-word queries, I would like to reward documents that contain a more
even distribution of each word and penalize documents that have a skewed
distribution. For example, if my search query is:
+content:fast +content:car
I would prefer a document that contains each word an equal number of
).
>
> Thanks,
> D.
>
> ---------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
Andy Liu
[EMAIL PROTECTED]
(301) 873-8458
Is there a way to calculate term frequency scores that are relative to
the number of terms in the field of the document? We want to override
tf() in this way to curb keyword spamming in web pages. In
Similarity, only the document's term frequency is passed into the tf()
method:
float tf(int freq