okay, so i'm very new to lucene, so it may be my bad, but i can get it to index .txt files, and when trying to index word documents (using poi), the program starts running and when it reaches a .doc file, i get the following errors:
Exception in thread "main" org.apache.poi.hpsf.IllegalPropertySetDataException: The property set claims to have a size of 16 bytes. However, it exceeds 16 bytes. at org.apache.poi.hpsf.Section.<init>(Section.java:255) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:454) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249) at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:61) at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:92) at org.apache.poi.POIDocument.readProperties(POIDocument.java:69) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:147) at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:56) at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:48) at Indexer.indexFile(Indexer.java:76) at Indexer.indexDirectory(Indexer.java:57) at Indexer.index(Indexer.java:38) at Indexer.main(Indexer.java:20) and my code is as follows: private static void indexFile(IndexWriter writer, File f) throws IOException { if (f.isHidden() || !f.exists() || !f.canRead()) { return; } System.out.println("A acrescentar " + f.getCanonicalPath() + " ao indice."); Document doc = new Document(); // For .doc files if (f.getName().endsWith(".doc")){ FileInputStream docfin = new FileInputStream(f.getAbsolutePath()); WordExtractor docextractor = new WordExtractor(docfin); String content = docextractor.getText(); doc.add(new Field("contents", content, Field.Store.NO, Field.Index.TOKENIZED)); } // For .txt files else if (f.getName().endsWith(".txt")) { doc.add(new Field("contents", new FileReader(f))); } doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.TOKENIZED)); writer.addDocument(doc); } (I think i included all that's necessary) Thanks in advance for any help. -- View this message in context: http://www.nabble.com/Problem-indexing-Word-Documents-tf4876643.html#a13954702 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]