RE: Highligher Example

2006-09-08 Thread Dejan Nenov
Second that - I was a client of Stellent - the libs work great but are expensive. To see Stellent in action - get a copy of the free X1 desktop search or the X1 server (Lucene based). Another alternative is KeyView from Verity - now Autonomy. -Original Message- From: mark harwood [mailto:[

RE: word frequency list?

2006-09-03 Thread Dejan Nenov
Unfortunately the term search at the site is down - gives 500 internal server error. -Original Message- From: Dave Kor [mailto:[EMAIL PROTECTED] Sent: Sunday, September 03, 2006 9:22 PM To: java-user@lucene.apache.org Subject: Re: word frequency list? There is the Berkeley Web Term Frequ

RE: Sorting based on a selling rate

2006-08-28 Thread Dejan Nenov
(excuse the semi-appropriate forum to make this comment in - but it is very brief and may actually help improve the final Lucene-based app) You may also like to import popularity data from Amazon using their open APIs and mix the relevancy between your own popularity score and theirs. Dejan (affi

RE: Scoring Technique based on Relevance Feeback & other Parameters

2006-08-23 Thread Dejan Nenov
Indeed - you bring up interesting questions. You may want to take a look at NUTCH first, however - I am not sure if they have done some of the Google-like ranking you mention. However - collaborative relevance enhancement, based on user feedback, would be a nice Web-2.0-ish feature to bake into th

RE: Best Practice: emails and file-attachments

2006-08-15 Thread Dejan Nenov
The approach we I find best is to create both Email documents - where a list (and links) to all attachments is contained as well as individual Attachment documents. It gets a little tricky when you have a forwarded email, containing an original Email that contains a tar.gz attachment, which contai

RE: 30 milllion+ docs on a single server

2006-08-14 Thread Dejan Nenov
The important detail here is what you mean by "single server"? A high-end server will work just fine - you want 4GB+ or RAM and the fastest disk/IO you can get; CPU speed is far less important; A nice Linux software RAID and 5+ 15K SCSI disks will get you superb performance, at a reasonable price.

RE: Indexing large sets of documents?

2006-07-27 Thread Dejan Nenov
Yes - parallelizing works great - we built a share-nothing java-spaces based system at X1 and on a 11-way cluster were able to index 350 office documents per second - this included the binary-2-text conversion, using Stellent INSO libraries. The trick is to create separate indexes and, if you do no

RE: Building easy to use search guis? How to save queries...

2006-07-17 Thread Dejan Nenov
Michael - Please take a look at our MakeTime UI here: http://www.maketime.com It is in fact Lucene on the back end - albeit very hard to tell :) Dejan -Original Message- From: Michael Prichard [mailto:[EMAIL PROTECTED] Sent: Monday, July 17, 2006 8:00 PM To: java-user@lucene.apache.org

Are Search Joins Possible between two Physically separate Indexes?

2006-07-13 Thread Dejan Nenov
Here is a use case I am trying to address. I have two separate indexes, which contain sets of the same document pool/corpus. The two indexes have a different set of indexed fields. One of the indexed fields is an external DocumentID. I would like to perform searches, like a relational join, expre