Re: Using lucene as a database... good idea or bad idea?

2008-07-31 Thread Andy Liu
If essentially all you need is key-value storage, Berkeley DB for Java works well. Lookup by ID is fast, can iterate through documents, supports secondary keys, updates, etc. Lucene would work relatively well for this, although inserting documents might not be as fast, because segments need to be

Re: Index updates between machines

2007-04-03 Thread Andy Liu
Sounds like you might have an I/O issue. If you have multiple partitions / disks on the searching server you can search from one partition and copy to another and alternate. If you're using RAID different RAID levels are optimized for simultaneous reads and writes. If you have a 3rd machine you

Re: Range search in numeric fields

2007-04-03 Thread Andy Liu
You can try using MemoryCachedRangeFilter. https://issues.apache.org/jira/browse/LUCENE-855 It stores field values in memory as longs so your values don't have to be lexigraphically comparable. Also, MemoryCachedRangeFilter can be orders of magnitude faster than standard RangeFilter, depending

Re: Using ParallelReader over large immutable index and small updatable index

2007-03-07 Thread Andy Liu
so far is how MultiSearcher handles custom Similarity (see https://issues.apache.org/jira/browse/LUCENE-789). Hope this helps, Alexey -----Original Message- From: Andy Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2007 3:34 PM To: java-user@lucene.apache.org Subject: Using ParallelReader

Using ParallelReader over large immutable index and small updatable index

2007-03-06 Thread Andy Liu
Is there a working solution out there that would let me use ParallelReader to search over a large, immutable index and a smaller, auxillary index that is updated frequently? Currently, from my understanding, the ParallelReader fails when one of the indexes is updated because the document ID's get

Re: Lopsided scores for each term in BooleanQuery

2006-09-18 Thread Andy Liu
I'm just not seeing? Andy On 9/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote: On Monday 18 September 2006 23:08, Andy Liu wrote: > For multi-word queries, I would like to reward documents that contain a more > even distribution of each word and penalize documents that have a skewe

Lopsided scores for each term in BooleanQuery

2006-09-18 Thread Andy Liu
For multi-word queries, I would like to reward documents that contain a more even distribution of each word and penalize documents that have a skewed distribution. For example, if my search query is: +content:fast +content:car I would prefer a document that contains each word an equal number of

Re: A very technical question.

2005-09-28 Thread Andy Liu
). > > Thanks, > D. > > --------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Andy Liu [EMAIL PROTECTED] (301) 873-8458

Relative term frequency?

2005-06-06 Thread Andy Liu
Is there a way to calculate term frequency scores that are relative to the number of terms in the field of the document? We want to override tf() in this way to curb keyword spamming in web pages. In Similarity, only the document's term frequency is passed into the tf() method: float tf(int freq