Hey, PunKeel, this is great!

If your software is based on POI and you'd like to upstream some of your
changes to POI to make your library more straight forward, send us a
pull request and we'll review it, give feedback, and commit it.

I have barely dabbled in how VBA projects are saved in the OLE2
formats (VBAMacroReader class), but perhaps others have some ideas and
a few free cycles to spare (being a volunteer project and all).

Keep in mind that PPT files save macros in a different part of the
OLE2 file structure than XLS and DOC. A reminder that these binary
formats weren't simultaneously developed by the same software team at
the same time.

The following bugs might be of interest to you:
https://bz.apache.org/bugzilla/buglist.cgi?bug_id=52949%2C59302%2C60273%2C59830%2C59858%2C60158&list_id=159842

Feel free to continue this discussion over on d...@poi.apache.org,
where they might be a better technical audience who could point out
some of the POI internals. Most of the POI devs monitor both mailing
lists, so which mailing list probably doesn't matter too much.

Javen

On Tue, May 2, 2017 at 6:19 PM, PunKeel <punk...@me.com> wrote:
> Hello,
>
> I've found out the solution for Excel files: removing the "ObProj" record of
> the Workbook.
>
> The code I'm using is available here:
> https://gist.github.com/PunKeel/0e72ccde78cb0150383a9ced094c2bce
> I don't think this is the cleanest way to achieve it, so I'm open to
> suggestions.
>
> It also doesn't work for Word/PPT files. If I understand correctly (let's
> hope I do), the
> (Current User, PowerPoint Document) and (WordDocument, 0Table, 1Table)
> streams
> need to be edited by hand because Apache POI is lacking the APIs for these
> formats.
> ~> How? Are they like "DocumentStreams"? May I use the "RecordFactory"?
>
> Am I right? I am more than open to suggestions/help, please!
>
> Best regards,
>
>
> On 1 May 2017, at 6:15 PM, PunKeel <punk...@me.com> wrote:
>
> Hello,
>
> I am currently building an open-source software to disarm Office files,
> named DocBleach [1]
> but I am stuck with some specificities the OLE2 format.
>
> First of all, I would like to thank you for the great library that is Apache
> POI!
>
> When opening disarmed OLE2 files in Office Excel/Word on Windows 10 (haven't
> checked other versions),
> an alert is displayed, depending on the editor:
>
> - Excel says that the file is corrupted and needs to be repaired. The error:
> "Lost Visual Basic project".
> ~> Once repaired, the Macro Viewer is unable to tell us the name of the
> Macros
>
> - Words tells us that the file is unsafe because it contains Macros.
> ~> The "Macro Viewer" is able to tell us the name of the Macros
>
> ----
>
> As you know, OLE2 files are "file systems in a file".
> In order to remove the Macros of a document, I remove the Macros
> "directory".
>
> Sample log, for the record. (The process being the same for Word/Excel, I'll
> only give one).
> Relative code is available at [2]
> Sample files, with their sanitised form (named "-free")
> ~> https://www.mediafire.com/folder/yh122tgbyzadw/Sample_2017_May_1
>
> #####
>
> $ java -jar docbleach.jar -vv -in ../Goodware/macro.doc -out - >
> macro-free.doc
> [main] DEBUG xyz.docbleach.cli.Main - Log Level: TRACE
> [main] TRACE xyz.docbleach.api.bleach.CompositeBleach - Using bleach: OLE2
> Bleach
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries before:
> [CompObj, 1Table, SummaryInformation, WordDocument,
> DocumentSummaryInformation, Macros]
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Root ClassID:
> {00020906-0000-0000-C000-000000000046}
>
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively:
> SummaryInformation, parent:
> org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively:
> DocumentSummaryInformation, parent:
> org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively:
> WordDocument, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
> [main] TRACE xyz.docbleach.module.ole2.OLE2Bleach - copyNodesRecursively:
> 1Table, parent: org.apache.poi.poifs.filesystem.DirectoryNode@2e5c649
>
> [main] DEBUG xyz.docbleach.module.ole2.OLE2Bleach - Entries after: [1Table,
> SummaryInformation, WordDocument, DocumentSummaryInformation]
>
> #####
>
> The CompObj and Macros entries are removed (not copied), so the Macros
> *can't* work.
>
> I've been trying a lot of things, especially with Excel files (they only
> contain a Workbook,
> SummaryInformation and DocumentSummaryInformation) and I've found out the
> Workbook
> was in fault: the two summaries did not contain the "macro reference", and I
> recreate the file
> from scratch so it has to be in an entry.
>
> If I understand correctly, there are "entries" in the Workbook/WordDocument
> holding the Macros.
> I found "stwUser" in the Word documentation [3], and I assume that I need to
> remove it, but couldn't
> find an unified API to achieve it for Word/Excel/PowerPoint documents.
>
> My question: is there an "easy" API to interact with these entries, removing
> parts of it?
> If so, could you please give me some leads/examples on how to do it?
> If not, do you have tips on how to achieve something similar?
>
> I could iterate over the Workbook/Document to copy it over manually, without
> the Macros…
> but if the API allowing it is not unified, I would have to do it for
> XLS/Word/PPT files, right?
> Doesn't seem like the easy path! :-(
>
> Thanks in advance!
>
> - PunKeel
>
> [1]: https://github.com/docbleach/DocBleach
> [2]:
> https://github.com/docbleach/DocBleach/blob/master/module/module-office/src/main/java/xyz/docbleach/module/ole2/OLE2Bleach.java
> [3]: https://msdn.microsoft.com/en-us/library/dd923194(v=office.12).aspx
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
> For additional commands, e-mail: user-h...@poi.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
For additional commands, e-mail: user-h...@poi.apache.org

Reply via email to