Yes, you'll be fine with 100 million, I've got a couple of non-performance
sensitive indexes that are more than double that (280M) with about 20
seachable fields as well. We get results back in the 10-20 second range
which is fine for our end users.
Vince
On 5/13/05, Richard Krenek <[EMAIL PRO
Hypothetically I have 100 million records. Each record has 100+
fields. Only 20 of those fields need to be searched on, the rest
(including the 20) are just for display purposes.
Would it be best to just add the 20 fields to the index and keep the
rest in a relational database? What affect does all
On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote:
> I don't really consider reading/writing to an NFS mounted FSDirectory to
> be viable for the very reasons you listed; but I haven't really found any
> evidence of problems if you take they approach that a single "writer"
> node indexes to local
> -Original Message-
> From: Ian Soboroff [mailto:[EMAIL PROTECTED]
>
> Grossman and Frieder's book, "Information Retrieval, Algorithms and
> Heuristics", is out in a second (and much cheaper, too!) edition,
> probably the most up-to-date textbook.
Much along the same lines, I'm curio
Gary Moore <[EMAIL PROTECTED]> writes:
> Salton, Gerald and McGill, Michael J. /Introduction to Modern
> Information Retrieval/. McGraw-Hill, 1983.
Not only hard to get ahold of these days, but really really really out
of date. This book should be of historical interest only.
Frakes and Baez
Are you sure that
1) Your tokenStream emits terms identical to those
produced by the query - a difference in choice of
analyzer will emit tokens which dont correspond for
the same text eg "dog"!="Dog"
2) Your "body" string represents the same text of the
field in the exact document which matched.
Hi,
When I do a Phrase Query I do not get any highlights. Here is my call
highlighter = new Highlighter(new QuerySocorer(query.rewrite(indexReader)))
highlighter.getBestFragments(tokenStream, body, numPreviews, ELIPSE);
I tried it with out the rewite but that didn't help.
Thanks,
Andrew
--
On May 12, 2005, at 10:24 AM, Goel, Nikhil wrote:
1) Lucene does the inverted indexing by which we mean it keeps how
many
times a particular token is used. Is there a way to find out the
list of
most frequently used words in the descending order.
Have a look at Luke's code to see how it does th
Hello
I've got many documents that are potentially duplicate (merging several
external systems). Any tips how to find documents that are potentially
duplicate (using a variable ranking like >0.5 match)..
I can use the similarity (MoreLikeThis) method from Sandbox, but that's always
comparing
Hello
I've got many documents that are potentially duplicate (merging several
external systems). Any tips how to find documents that are potentially
duplicate (using a variable ranking like >0.5 match)..
I can use the similarity (MoreLikeThis) method from Sandbox, but that's always
comparing
10 matches
Mail list logo