Re: Indexing MSword Documents

2007-06-09 Thread jim shirreffs
oding, header, and other specificity. Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/ org/apache/nutch/parse/msword/package-summary.html), but, IMHO, it's not the more difficult part. M. Le 8 juin 07 à 19:23, jim shirreffs a écrit : Hi, I am trying to index mswor

Re: Indexing MSword Documents

2007-06-08 Thread jim shirreffs
rt. M. Le 8 juin 07 à 19:23, jim shirreffs a écrit : Hi, I am trying to index msword documents. I've got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using

Re: Indexing MSword Documents

2007-06-08 Thread jim shirreffs
many thanks I will try that, thanks again! jim s - Original Message - From: "Donna L Gresh" <[EMAIL PROTECTED]> To: Sent: Friday, June 08, 2007 12:52 PM Subject: Re: Indexing MSword Documents I do this exact thing. "text" (the second input to the Field constructor) is MSWord text

Indexing MSword Documents

2007-06-08 Thread jim shirreffs
Hi, I am trying to index msword documents. I've got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using an HTMLDocument object. Seems to me that since I have the te

Re: IndexWriter.Optimize() is too slow and IOException! How Can I do?

2007-06-08 Thread jim shirreffs
I am trying to index msword documents. I’ve got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using an HTLMDocument object. Seems to me that since I have the text

Re: Indexing PDF document

2007-06-06 Thread jim shirreffs
4 PM Subject: Re: Indexing PDF document you need to include the both the bouncy castle jars and FontBox jar. Both are included with the PDFBox distribution. Ben Quoting jim shirreffs <[EMAIL PROTECTED]>: Thanks I rebuilt PDFbox and got past that problem but now I am getting Exc

Re: Indexing PDF document

2007-06-06 Thread jim shirreffs
Thanks I rebuilt PDFbox and got past that problem but now I am getting Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider seems my test pdf file is provider locked so I tried a Lucene pdf file and got java.lang.NoClassDefFoundError

Indexing PDF document

2007-06-06 Thread jim shirreffs
Well I got no where trying to index openoffice documents so I thought I try indexing PDF documents. Seemed Like PDFBox was a good bet, claimed to offer Lucene support and was on the Lucene recommended list. But after numerious attempts failed I decided try the IndexFiles.java that comes with PDF

Re: Indexing help needed

2007-05-25 Thread jim shirreffs
code up a Reader the just spites out "Here I am" a few hundred times and see what happens. LOL. thank you for the reply and advice. jim s - Original Message - From: "Andrzej Bialecki" <[EMAIL PROTECTED]> To: Sent: Friday, May 25, 2007 1:10 PM Subject: R

Indexing help needed

2007-05-25 Thread jim shirreffs
I've been working on this for a while, I am trying to get the demo code that comes with Lucene to index OpenOffice documentss. I've looked at LIUS code and at Nutch code. But can't find an easy way. So I am digging into the code. I wrote a KcmiDocument class that returns a Document. In it I

Re: CAD files, Images

2007-05-23 Thread jim shirreffs
magic" to index it that I know of. Erick On 5/23/07, jim shirreffs <[EMAIL PROTECTED]> wrote: Is it possibe to index CAD formats such as AutoCad or CGM? I know some commecail products (excalaber) claim to be able to do that? If so what about TIFF? thanks jim s ---

CAD files, Images

2007-05-23 Thread jim shirreffs
Is it possibe to index CAD formats such as AutoCad or CGM? I know some commecail products (excalaber) claim to be able to do that? If so what about TIFF? thanks jim s - To unsubscribe, e-mail: [EMAIL PROTECTED] For additiona

Indexing Open Office documents

2007-05-17 Thread jim shirreffs
Anyone know how to add OpenOffice document to a Lucene index? Is there a parser for OpenOffice? thanks in advance jim s. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Implementing lagre secure Lucene search system questions.

2007-05-03 Thread jim shirreffs
Hi, I'm a relative Lucene newbe and would appreciate some expert advice. I would like to make fulltest searchable, files distributed on various local hosts in the intranet. My startup plan is to index these files locally and then merge all the little indexes into a master indexes on a search

Re: Merging Indeces

2007-04-21 Thread jim shirreffs
mixing databases and text searching, and I don't want to go there Of course, this would all work if we could just create the DWIM algorithm... Do What I Mean.. Erick On 4/21/07, jim shirreffs <[EMAIL PROTECTED]> wrote: "Lucene has no concept of "document identity&qu

Re:Merging Indeces

2007-04-21 Thread jim shirreffs
"Lucene has no concept of "document identity" in that you can index the same document 15 times in a row and Lucene will have 15 entries. " Is this true? When ever I run the demo indexing logic document already indexed are skipped. What am I missing. jim s start java org.apache.lucene.demo.In

Can indexing logic on one host update an index on another host?

2007-04-20 Thread jim shirreffs
Can indexing logic on one host update an index on another host? In my application the files I wish to index/search live in distributed vaults on "safe" hosts in the intranet. Accessing those files is strictly controller by application logic in a (Tomcat) servlet. Crawling the vaults is not an

Re: Newbie needs help "addField"

2007-04-19 Thread jim shirreffs
Thanks to Karl and Donna, I followed your suggestions and was able to get a test driver (modified demo code) working, thanks again. jim s - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROT

Newbie needs help "addField"

2007-04-18 Thread jim shirreffs
Hi, I have been using Lucene "out of the box" since 1.4.3, wonderful full text engine, I love it. But I can't use it "out of the box" any more, I am going to have to write some code (Oh no! Mr Bill.). I am fairly certain that the code needed will be trivial, but I am unfamiliar with Lucene's A