How to Uniquely Identify Documents in a Lucene Index

2008-04-27 Thread Hasan Diwan
I'm working on a JSP-based, free-form text storage & retrieval system based on lucene. Part of my desired feature set includes the ability to retrieve, edit, and update text comprising the document. The user flow involves: A search for a document, whose "all" field is then retrieved, then it can be

Re: Does Lucene save an offline version of web pages?

2008-04-27 Thread Bill Janssen
> - Fetch and index some pages (containing word and pdf documents) on > daily basis. > - Extract all pages that contain some provided keywords after fetching > the pages. > - Create some bulletin from fetched pages, bulletin will be in pdf > format and are categorized based on keywords. > - provide

Re: Does Lucene save an offline version of web pages?

2008-04-27 Thread Lukas Vlcek
Hi, this sounds like job for Nutch (one of Lucene family projects). On Sun, Apr 27, 2008 at 8:26 PM, Legolas wood <[EMAIL PROTECTED]> wrote: > Hi > Thank you for reading my post. > I have to design a system with the following requirements, I think > Lucene or one of the projects which are based

TrecDocMaker

2008-04-27 Thread DanaWhite
Greetings, I am trying to use TrecDocMaker so I can successfully index and evaluate lucene on a TReC collection. It seems like I would just repeatedly call makeDocument() until all the Documents have been created, but makeDocument appears to just read forever. In general TrecDocMaker seems like

Does Lucene save an offline version of web pages?

2008-04-27 Thread Legolas wood
Hi Thank you for reading my post. I have to design a system with the following requirements, I think Lucene or one of the projects which are based on Lucene can help me as a base to continue on. Here is the requirements: - Fetch and index some pages (containing word and pdf documents) on daily bas

Re: Does lucene support distributed indexing?

2008-04-27 Thread Otis Gospodnetic
There are actually several distributed indexing or searching projects in Lucene (the top-level ASF Lucene project, not Lucene Java), and it's time to start thinking about the possibility of bringing them together, finding commonalities, etc. Here is the summary: - Lucene - distributed search vi

Re: Does lucene support distributed indexing?

2008-04-27 Thread Samuel Guo
Thanks a lot :) 2008/4/26 Grant Ingersoll <[EMAIL PROTECTED]>: > > On Apr 26, 2008, at 2:33 AM, Samuel Guo wrote: > > Hi all, > > > > I am a lucene newbie:) > > > > It seems that lucene doesn't support distributed indexing:( > > As some IR research papers mentioned, when the documents collection