Hello,

I am currently building an open-source software to disarm Office files, named 
DocBleach [1]
but I am stuck with some specificities the OLE2 format.

First of all, I would like to thank you for the great library that is Apache 
POI!

When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (haven't 
checked other versions),
an alert is displayed, depending on the editor:

- Excel says that the file is corrupted and needs to be repaired. The error: 
"Lost Visual Basic project".
~> Once repaired, the Macro Viewer is unable to tell us the name of the Macros

- Words tells us that the file is unsafe because it contains Macros.
~> The "Macro Viewer" is able to tell us the name of the Macros

----

As you know, OLE2 files are "file systems in a file".
In order to remove the Macros of a document, I remove the Macros "directory".

Sample log, for the record. (The process being the same for Word/Excel, I'll 
only give one).
Relative code is available at [2]
Sample files, with their sanitised form (named "-free")
~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1

#####

$ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - > macro-free.doc
[main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE
[main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE2 
Bleach
[main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before: [CompObj, 
1Table, SummaryInformation, WordDocument, DocumentSummaryInformation, Macros]
[main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID: 
{00020906-0000-0000-C000-000000000046}

[main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
SummaryInformation, parent: 
org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
[main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
DocumentSummaryInformation, parent: 
org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
[main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
[main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively: 
1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649

[main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Table, 
SummaryInformation, WordDocument, DocumentSummaryInformation]

#####

The CompObj and Macros entries are removed (not copied), so the Macros *can't* 
work.

I've been trying a lot of things, especially with Excel files (they only 
contain a Workbook,
SummaryInformation and DocumentSummaryInformation) and I've found out the 
Workbook
was in fault: the two summaries did not contain the "macro reference", and I 
recreate the file
from scratch so it has to be in an entry.

If I understand correctly, there are "entries" in the Workbook/WordDocument 
holding the Macros.
I found "stwUser" in the Word documentation [3], and I assume that I need to 
remove it, but couldn't
find an unified API to achieve it for Word/Excel/PowerPoint documents.

My question: is there an "easy" API to interact with these entries, removing 
parts of it?
If so, could you please give me some leads/examples on how to do it?
If not, do you have tips on how to achieve something similar?

I could iterate over the Workbook/Document to copy it over manually, without 
the Macros…
but if the API allowing it is not unified, I would have to do it for 
XLS/Word/PPT files, right?
Doesn't seem like the easy path! :-(

Thanks in advance!

- PunKeel

[1]: https://github.com/docbleach/DocBleach
[2]: 
https://github.com/docbleach/DocBleach/blob/master/module/module-office/src/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java
[3]: https://msdn.microsoft.com/en-us/library/dd923194(v=office.12).aspx
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
For additional commands, e-mail: user-h...@poi.apache.org

Reply via email to