On Apr 19, 2005, at 7:37 AM, Eric Chow wrote:
Hello,
Is there any RTF text extractor for Lucene ?
You can use some Swing classes to do this. This is from the Lucene in Action code (http://www.lucenebook.com/search?query=rtf)
public Document getDocument(InputStream is) throws DocumentHandlerException {
String bodyText = null;
DefaultStyledDocument styledDoc = new DefaultStyledDocument(); try { new RTFEditorKit().read(is, styledDoc, 0); bodyText = styledDoc.getText(0, styledDoc.getLength()); } catch (IOException e) { throw new DocumentHandlerException( "Cannot extract text from a RTF document", e); } catch (BadLocationException e) { throw new DocumentHandlerException( "Cannot extract text from a RTF document", e); }
if (bodyText != null) { Document doc = new Document(); doc.add(Field.UnStored("body", bodyText)); return doc; } return null; }
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]