The thought was to keep an MD5 of each file (or similar), and if that changes 
then trigger the actual validation.  First run would be intense, but most files 
don't change much.  Perhaps ever.

It's funny you mention LibreOffice, because a suggestion I received was to use 
the command line tool 'soffice.exe' which is part of LibreOffice to check the 
office documents.  Basically if soffice can turn it into a pdf, (to be deleted 
afterward) then the file would be considered a 'valid' file.  

In regards to images, the ImageMagick tool of "identify" would produce the meta 
data from image files. Also in the running to enter the test phase of this 
project.  http://www.imagemagick.org/script/identify.php

In the case of a Word document with a macro virus, hopefully (fingers crossed!) 
the malware scan would find it as soon as it was saved. If we're using 
LibreOffice, we'd hopefully have the option to disable macros when (test) 
converting it to PDF.

This definitely would be interesting. I hope I get the green light to work on 
it.

"Most useful complex projects begin their lives as useful simple projects."

-Kevin

-----Original Message-----
From: ProFox [mailto:[email protected]] On Behalf Of Ted Roche
Sent: Wednesday, August 17, 2016 2:29 PM
To: [email protected]
Subject: Re: Common File Document Validation

That's a great question!

Obviously, since the post's subject didn't include "[NF]" you've already found 
your solution -- FoxPro! *wink*

I've done some document management systems in VFP, and the recursion, 
cataloging and checksums is easy, relatively-speaking. But the validation is an 
interesting twist, and a much more difficult problem.

Triggering the checking is also an interesting feature. Doing a bulk rescan 
would be slow and intensive, though you could tune it to not consume excessive 
resources, at a cost of slower checking.

Windows File Systems have some advanced features in the newer servers that 
would let you hook into a file system event (adding a new file or saving over 
an old one) to trigger your validation routine. If WinFS had ever been 
released, (https://en.wikipedia.org/wiki/WinFS) that would have been perfect, 
but alas, it was another empty vaporware promise of "The Old Microsoft." 
However some of "Longhorn" did end up in DotNet, like:

https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.changed(v=vs.110).aspx

A simpler solution might be a "Document Management System" but implementing one 
of these is a tough challenge in technology, politics, and technical support.

"Validity" is a bit nebulous. How are you defining that?

I mean, there are Word95 documents I can't open in Word2007, but can in 
LibreOffice. And is a Word document with a macro virus valid? How many versions 
and variations to support? How to handle password-encrypted or restricted files?

VFP would be a great tool for doing the validation, where you can use low-level 
file functions to read headers and calculate checksums, but complex structured 
documents, like MS's Compound OLE Documents, and MS's ZIP-encoded XML and JSON 
DocX documents, get a lot trickier.
There's typically a "magic" signature at the beginning of most files that will 
tell you it's type, but whether all the contents have integrity is a lot 
tougher to determine. I suspect each format would need to be reviewed to 
determine if there were internal consistency checks that would tell you of 
corruption or truncation.

Sounds like an interesting project, though. Will be interested to hear if you 
find a suitable package, or DIY it.

--
Ted Roche
Ted Roche & Associates, LLC
http://www.tedroche.com

[excessive quoting removed by server]

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/f82db4de49440845a14b540f6c2df6a91be...@blufitspmbs01.cherokeega.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to