Here is a use case I am trying to address.
I have two separate indexes, which contain sets of the same document
pool/corpus.
The two indexes have a different set of indexed fields.
One of the indexed fields is an external DocumentID.
I would like to perform searches, like a relational join, expre
Michael -
Please take a look at our MakeTime UI here: http://www.maketime.com
It is in fact Lucene on the back end - albeit very hard to tell :)
Dejan
-Original Message-
From: Michael Prichard [mailto:[EMAIL PROTECTED]
Sent: Monday, July 17, 2006 8:00 PM
To: java-user@lucene.apache.org
Yes - parallelizing works great - we built a share-nothing java-spaces based
system at X1 and on a 11-way cluster were able to index 350 office documents
per second - this included the binary-2-text conversion, using Stellent INSO
libraries. The trick is to create separate indexes and, if you do no
The important detail here is what you mean by "single server"?
A high-end server will work just fine - you want 4GB+ or RAM and the fastest
disk/IO you can get; CPU speed is far less important; A nice Linux software
RAID and 5+ 15K SCSI disks will get you superb performance, at a reasonable
price.
The approach we I find best is to create both Email documents - where a list
(and links) to all attachments is contained as well as individual Attachment
documents.
It gets a little tricky when you have a forwarded email, containing an
original Email that contains a tar.gz attachment, which contai
Indeed - you bring up interesting questions. You may want to take a look at
NUTCH first, however - I am not sure if they have done some of the
Google-like ranking you mention.
However - collaborative relevance enhancement, based on user feedback, would
be a nice Web-2.0-ish feature to bake into th
(excuse the semi-appropriate forum to make this comment in - but it is very
brief and may actually help improve the final Lucene-based app)
You may also like to import popularity data from Amazon using their open
APIs and mix the relevancy between your own popularity score and theirs.
Dejan (affi
Unfortunately the term search at the site is down - gives 500 internal
server error.
-Original Message-
From: Dave Kor [mailto:[EMAIL PROTECTED]
Sent: Sunday, September 03, 2006 9:22 PM
To: java-user@lucene.apache.org
Subject: Re: word frequency list?
There is the Berkeley Web Term Frequ
Second that - I was a client of Stellent - the libs work great but are
expensive. To see Stellent in action - get a copy of the free X1 desktop
search or the X1 server (Lucene based).
Another alternative is KeyView from Verity - now Autonomy.
-Original Message-
From: mark harwood [mailto:[