https://bz.apache.org/bugzilla/show_bug.cgi?id=60471
Bug ID: 60471 Summary: Not loading AlternateContent in XWPF Product: POI Version: 3.16-dev Hardware: PC Status: NEW Severity: normal Priority: P2 Component: XWPF Assignee: dev@poi.apache.org Reporter: talli...@mitre.org Target Milestone: --- Created attachment 34522 --> https://bz.apache.org/bugzilla/attachment.cgi?id=34522&action=edit triggering file based on testWORD_2006ml.docx in Tika XWPFDocument's onDocumentLoad() looks for paragraphs, tables and sdts at the main level of the body. As we saw with Bug 54849 (SDTs), there can be other intervening structures between the body and text-containing elements. I recently noticed that AlternateContent elements can also appear at the body level, and we should probably add those to our document model. To create this test file, I added a title page via Word's default "add a title page function". In the SAX parser that I added to Tika, I chose to extract text from the Fallback section on the theory that that would have the more easily parseable content. If we're modeling read/write in our DOM/XWPFDocument, we'll probably want to point to both Fallback and Choice? Unit test: public void testAlternateContent() throws IOException { XWPFDocument doc = XWPFTestDataSamples.openSampleDocument("testAlternateContent.docx"); XWPFWordExtractor extractor = new XWPFWordExtractor(doc); String txt = extractor.getText(); assertContainsSpecificCount("engaging abstract", txt, 1); assertContainsSpecificCount("MyDocumentTitle", txt, 1); assertContainsSpecificCount("MyDocumentSubtitle", txt, 1); } private void assertContainsSpecificCount(String needle, String haystack, int expectedCount) { int index = haystack.indexOf(needle); int found = 0; while (index > -1) { found++; index = haystack.indexOf(needle, index+1); } assertEquals(expectedCount, found); } -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org