Hello,

I've found out the solution for Excel files: removing the "ObProj" record of 
the Workbook.

The code I'm using is available here: 
https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce 
<https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce>
I don't think this is the cleanest way to achieve it, so I'm open to 
suggestions.

It also doesn't work for Word/PPT files. If I understand correctly (let's hope 
I do), the
(Current User, PowerPoint Document) and (WordDocument, 0Table, 1Table) streams
need to be edited by hand because Apache POI is lacking the APIs for these 
formats.
~> How? Are they like "DocumentStreams"? May I use the "RecordFactory"?

Am I right? I am more than open to suggestions/help, please!

Best regards,


> On 1 May 2017, at 6:15 PM, PunKeel <punk...@me.com> wrote:
> 
> Hello,
> 
> I am currently building an open-source software to disarm Office files, named 
> DocBleach [1]
> but I am stuck with some specificities the OLE2 format.
> 
> First of all, I would like to thank you for the great library that is Apache 
> POI!
> 
> When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (haven't 
> checked other versions),
> an alert is displayed, depending on the editor:
> 
> - Excel says that the file is corrupted and needs to be repaired. The error: 
> "Lost Visual Basic project".
> ~> Once repaired, the Macro Viewer is unable to tell us the name of the Macros
> 
> - Words tells us that the file is unsafe because it contains Macros.
> ~> The "Macro Viewer" is able to tell us the name of the Macros
> 
> ----
> 
> As you know, OLE2 files are "file systems in a file".
> In order to remove the Macros of a document, I remove the Macros "directory".
> 
> Sample log, for the record. (The process being the same for Word/Excel, I'll 
> only give one).
> Relative code is available at [2]
> Sample files, with their sanitised form (named "-free")
> ~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1
> 
> #####
> 
> $ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - > 
> macro-free.doc
> [main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE
> [main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE2 
> Bleach
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before: [CompObj, 
> 1Table, SummaryInformation, WordDocument, DocumentSummaryInformation, Macros]
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID: 
> {00020906-0000-0000-C000-000000000046}
> 
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
> SummaryInformation, parent: 
> org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
> DocumentSummaryInformation, parent: 
> org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
> WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
> 1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> 
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Table, 
> SummaryInformation, WordDocument, DocumentSummaryInformation]
> 
> #####
> 
> The CompObj and Macros entries are removed (not copied), so the Macros 
> *can't* work.
> 
> I've been trying a lot of things, especially with Excel files (they only 
> contain a Workbook,
> SummaryInformation and DocumentSummaryInformation) and I've found out the 
> Workbook
> was in fault: the two summaries did not contain the "macro reference", and I 
> recreate the file
> from scratch so it has to be in an entry.
> 
> If I understand correctly, there are "entries" in the Workbook/WordDocument 
> holding the Macros.
> I found "stwUser" in the Word documentation [3], and I assume that I need to 
> remove it, but couldn't
> find an unified API to achieve it for Word/Excel/PowerPoint documents.
> 
> My question: is there an "easy" API to interact with these entries, removing 
> parts of it?
> If so, could you please give me some leads/examples on how to do it?
> If not, do you have tips on how to achieve something similar?
> 
> I could iterate over the Workbook/Document to copy it over manually, without 
> the Macros…
> but if the API allowing it is not unified, I would have to do it for 
> XLS/Word/PPT files, right?
> Doesn't seem like the easy path! :-(
> 
> Thanks in advance!
> 
> - PunKeel
> 
> [1]: https://github.com/docbleach/DocBleach
> [2]: 
> https://github.com/docbleach/DocBleach/blob/master/module/module-office/src/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java
> [3]: https://msdn.microsoft.com/en-us/library/dd923194(v=office.12).aspx
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
> For additional commands, e-mail: user-h...@poi.apache.org
> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to