GitHub user akhikhl opened a pull request:
https://github.com/apache/poi/pull/3
fix for information loss on footnotes/endnotes within XWPFRun.toString
Dear Apache POI Team,
Please consider a problem: whenever MS-Word document with
footnotes/endnotes is being parsed with XWPFWordExtractor, information on the
location of footnote/endnote references is lost. This information loss is
clearly observed in, for example, Apache Tika output.
To reproduce a problem, please insert the following code to
TestXWPFWordExtractor.testFootnotes:
java.io.FileWriter w = new java.io.FileWriter(new
java.io.File(System.getProperty("user.home"), "footnotes.output.txt"));
try {
w.write(extractor.getText());
} finally {
w.close();
}
and inspect the content of "footnotes.output.txt" - it contains "Eto ochen
prostoy text so snoskoy", where between "prostoy" and "text" there should be a
footnote reference (and it is lost).
SOLUTION:
I suggest to introduce additional markup like [footnoteRef:num],
[endnoteRef:num], which will allow applications to correctly render footnote
references.
Please, see commit details.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/akhikhl/poi enhanced-footnote-support
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/poi/pull/3.patch
----
----
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]