from:"田春峰"

回复： RE: HTML text extraction

2006-06-29 Thread 田春峰

hi your attachement is empty, have no java source code in it. Liao Xuefeng <[EMAIL PROTECTED]> 写道： hi, all, I wrote my own html parser because it just meets my require and do not depend on 3rd part's lib. and i'd like to share it (in attachment). This class provides some static methods to

Re: Code search

2005-05-27 Thread 田春峰

hi, Lucene is greate project to serve as a source code search engine. I had made a source code search engine based on lucene , it perfermance very well. unforturnately , my version is chinese version. the url is ; http://www.domolo.com/domolo/ctrlc/index.aspx it search 101732 j

Re: Lucene - PDFBox

2005-05-25 Thread 田春峰

hi, I agree with Ben Litchfield, Before feed extracted text into lucene indexer , should ched the extracted text ,and for me , now using java org.pdfbox.ExtractText to get the text in pdf . [quote] "Ben Litchfield" <[EMAIL PROTECTED]> Can you run the following command line applica