Can i search for deleted documents?

2005-09-12 Thread dozean
Hi, can i search for documents which are marked for deletion? And if a document is marked for deletion, are the terms which are only in this document (and indexed) marked for deletion too? Bye Derya -- Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! Satte Provisionen für

Re: How to get a list of field names of one doc?

2005-09-12 Thread Riccardo Daviddi
No Erik, thx to you! I am sorry, I didn't understand that the enumeration was od Field type. Now all it works. Thank you again! On 9/11/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > Riccardo, > > I'm not quite understanding the issue with using Document.fields(). > It returns an Enumeration

PDFBox PDFExtractor

2005-09-12 Thread Rod.Madden
Hi, I am new to Lucene and looking at some existing Lucene code I am confused about the relationship ( if any ) between org.apache.slide.extractor.PDFExtractor methods and org.PDFBox.cos methods for the purposes of working with PDF files. I have found info on the web regarding PD

Re: PDFBox PDFExtractor

2005-09-12 Thread Jeroen Reijn
Hi Rod, PDFBox is a seperate project. The PDFExtractor in Jakarta Slide uses PDFBox's functionality to extract the information from the .pdf file. Hope this answers your question. Jeroen [EMAIL PROTECTED] wrote: Hi, I am new to Lucene and looking at some existing Lucene code I

RE: PDFBox PDFExtractor

2005-09-12 Thread Rod.Madden
Thanks for reply Jeroen ...does anyone have any experience / comments regarding the use of PDFTextStream versus PDFExtractor for working with PDF files ...the issue for us is that there appears to be very high memory usage when we work with PDF's using PDFExtractor. I have heard that PDFTextStream

RE: PDFBox PDFExtractor

2005-09-12 Thread Ben Litchfield
Text extraction from PDF documents is a fairly complex problem and is a delicate balance between speed/memory/accuracy/... How are you measuring your memory usage? In my opinion your two viable options are PDFBox(directly or via slides PDFExtractor) and PDFTextStream. They both integrate with l

ParallelReader and Date Filter

2005-09-12 Thread John Smith
Hi, I have Lucene 1.4.3 codebase and I got Parallel Reader from the trunk along with a few changes that need to go on top of it to make it compile. II have 2 indexes, against which I am querying using the Parallel Reader. Most of my queries work great. Thanks for the great work on this featu

Re: IndexReader delete doc! delete terms?

2005-09-12 Thread Yonik Seeley
In general, no. AFAIK, You can still find terms for documents that have been deleted, but the lowest level API for getting documents for that term (TermDocs) checks for deletions. Maybe you can get what you want by keeping your own deleted-docs bit vector. -Yonik Now hiring -- http://tinyurl.c

Re: ParallelReader and Date Filter

2005-09-12 Thread Erik Hatcher
On Sep 12, 2005, at 2:04 PM, John Smith wrote: I have Lucene 1.4.3 codebase and I got Parallel Reader from the trunk along with a few changes that need to go on top of it to make it compile. I highly recommend you simply compile the trunk and use it instead of trying to patch these classe

Stale NFS file handle Exception

2005-09-12 Thread Harini Raghavan
Hi All, I have 2 servers in the production environment, one running some Quartz jobs and the other one running the application. There is a common NFS mount which has the lucene index directory. The jobs fetch the latest data and update the lucene index. And the user can search on the index to

where i can find architectural documents about Lucene and Nutch

2005-09-12 Thread Legolas Woodland
Hi Thank you in advance Can some one help me about finding some documents about lucene and Nutch architecture and "how it works" also what are the algorithms used in Lucene indexing and everything about under laying system of lucene and Nutch Thank you

Re: where i can find architectural documents about Lucene and Nutch

2005-09-12 Thread Nader Henein
The Lucene wiki should be a good kick-off http://wiki.apache.org/jakarta-lucene/FrontPage?action=show&redirect=FrontPageEN Nader Henein Legolas Woodland wrote: Hi Thank you in advance Can some one help me about finding some documents about lucene and Nutch architecture and "how it works" als

Re: ParallelReader and Date Filter

2005-09-12 Thread John Smith
Thank you. I will try that JS Erik Hatcher <[EMAIL PROTECTED]> wrote: On Sep 12, 2005, at 2:04 PM, John Smith wrote: > I have Lucene 1.4.3 codebase and I got Parallel Reader from the > trunk along with a few changes that need to go on top of it to make > it compile. I highly recommend you si