Hello, I am currently building an open-source software to disarm Office files, named DocBleach [1] but I am stuck with some specificities the OLE2 format.
First of all, I would like to thank you for the great library that is Apache POI! When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (haven't checked other versions), an alert is displayed, depending on the editor: - Excel says that the file is corrupted and needs to be repaired. The error: "Lost Visual Basic project". ~> Once repaired, the Macro Viewer is unable to tell us the name of the Macros - Words tells us that the file is unsafe because it contains Macros. ~> The "Macro Viewer" is able to tell us the name of the Macros ---- As you know, OLE2 files are "file systems in a file". In order to remove the Macros of a document, I remove the Macros "directory". Sample log, for the record. (The process being the same for Word/Excel, I'll only give one). Relative code is available at [2] Sample files, with their sanitised form (named "-free") ~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1 ##### $ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - > macro-free.doc [main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE [main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE2 Bleach [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before: [CompObj, 1Table, SummaryInformation, WordDocument, DocumentSummaryInformation, Macros] [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID: {00020906-0000-0000-C000-000000000046} [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: SummaryInformation, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: DocumentSummaryInformation, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Table, SummaryInformation, WordDocument, DocumentSummaryInformation] ##### The CompObj and Macros entries are removed (not copied), so the Macros *can't* work. I've been trying a lot of things, especially with Excel files (they only contain a Workbook, SummaryInformation and DocumentSummaryInformation) and I've found out the Workbook was in fault: the two summaries did not contain the "macro reference", and I recreate the file from scratch so it has to be in an entry. If I understand correctly, there are "entries" in the Workbook/WordDocument holding the Macros. I found "stwUser" in the Word documentation [3], and I assume that I need to remove it, but couldn't find an unified API to achieve it for Word/Excel/PowerPoint documents. My question: is there an "easy" API to interact with these entries, removing parts of it? If so, could you please give me some leads/examples on how to do it? If not, do you have tips on how to achieve something similar? I could iterate over the Workbook/Document to copy it over manually, without the Macros… but if the API allowing it is not unified, I would have to do it for XLS/Word/PPT files, right? Doesn't seem like the easy path! :-( Thanks in advance! - PunKeel [1]: https://github.com/docbleach/DocBleach [2]: https://github.com/docbleach/DocBleach/blob/master/module/module-office/src/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java [3]: https://msdn.microsoft.com/en-us/library/dd923194(v=office.12).aspx --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@poi.apache.org For additional commands, e-mail: user-h...@poi.apache.org