Hello, I've found out the solution for Excel files: removing the "ObProj" record of the Workbook.
The code I'm using is available here: https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce <https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce> I don't think this is the cleanest way to achieve it, so I'm open to suggestions. It also doesn't work for Word/PPT files. If I understand correctly (let's hope I do), the (Current User, PowerPoint Document) and (WordDocument, 0Table, 1Table) streams need to be edited by hand because Apache POI is lacking the APIs for these formats. ~> How? Are they like "DocumentStreams"? May I use the "RecordFactory"? Am I right? I am more than open to suggestions/help, please! Best regards, > On 1 May 2017, at 6:15 PM, PunKeel <punk...@me.com> wrote: > > Hello, > > I am currently building an open-source software to disarm Office files, named > DocBleach [1] > but I am stuck with some specificities the OLE2 format. > > First of all, I would like to thank you for the great library that is Apache > POI! > > When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (haven't > checked other versions), > an alert is displayed, depending on the editor: > > - Excel says that the file is corrupted and needs to be repaired. The error: > "Lost Visual Basic project". > ~> Once repaired, the Macro Viewer is unable to tell us the name of the Macros > > - Words tells us that the file is unsafe because it contains Macros. > ~> The "Macro Viewer" is able to tell us the name of the Macros > > ---- > > As you know, OLE2 files are "file systems in a file". > In order to remove the Macros of a document, I remove the Macros "directory". > > Sample log, for the record. (The process being the same for Word/Excel, I'll > only give one). > Relative code is available at [2] > Sample files, with their sanitised form (named "-free") > ~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1 > > ##### > > $ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - > > macro-free.doc > [main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE > [main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE2 > Bleach > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before: [CompObj, > 1Table, SummaryInformation, WordDocument, DocumentSummaryInformation, Macros] > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID: > {00020906-0000-0000-C000-000000000046} > > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > SummaryInformation, parent: > org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > DocumentSummaryInformation, parent: > org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: > 1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649 > > [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Table, > SummaryInformation, WordDocument, DocumentSummaryInformation] > > ##### > > The CompObj and Macros entries are removed (not copied), so the Macros > *can't* work. > > I've been trying a lot of things, especially with Excel files (they only > contain a Workbook, > SummaryInformation and DocumentSummaryInformation) and I've found out the > Workbook > was in fault: the two summaries did not contain the "macro reference", and I > recreate the file > from scratch so it has to be in an entry. > > If I understand correctly, there are "entries" in the Workbook/WordDocument > holding the Macros. > I found "stwUser" in the Word documentation [3], and I assume that I need to > remove it, but couldn't > find an unified API to achieve it for Word/Excel/PowerPoint documents. > > My question: is there an "easy" API to interact with these entries, removing > parts of it? > If so, could you please give me some leads/examples on how to do it? > If not, do you have tips on how to achieve something similar? > > I could iterate over the Workbook/Document to copy it over manually, without > the Macros… > but if the API allowing it is not unified, I would have to do it for > XLS/Word/PPT files, right? > Doesn't seem like the easy path! :-( > > Thanks in advance! > > - PunKeel > > [1]: https://github.com/docbleach/DocBleach > [2]: > https://github.com/docbleach/DocBleach/blob/master/module/module-office/src/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java > [3]: https://msdn.microsoft.com/en-us/library/dd923194(v=office.12).aspx > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@poi.apache.org > For additional commands, e-mail: user-h...@poi.apache.org >
signature.asc
Description: Message signed with OpenPGP