3 cubic meters is about 60-90 book boxes of the size the mover gave us for our books the last time we moved.

If you are going to do it yourself, over a long period of time, I hope, I recommend the ScanSnap ix500 scanner. It scans about 25 pages (50 sides, since it scans both sides simultaneously) per minute, with a 50 sheet feeder and fairly intelligent detection of double feeds and blank sides.You have to be careful to check for dust buildup. The software with it is pretty good also.

For processing, classifying, and storing the files, I recommend DevonThink Pro Office if you have a Mac. It has some intelligence built in to it to determine similar content in different documents, and this supports auto-classification and "see also" functionality. I confess I haven't really given that part of it a test. The OCR of your documents can be done by ScanSnap or by DevonThink. DevonThink does not do data lock-in. Your documents will be files in the OS, but can be stored optionally in the DT 'database' which is just a bundle of files with indexes.

There are commercial scanning services, but I've never checked out their prices. If you scan them yourself, you will probably end up hating staples as much as I do. They can go through the scanner easily and harmlessly, but if they attach 2 or more sheets, you'll have to unjam the document feed. Booklets are no problem if you can take the pages apart. Books are no problems if you are happy bandsawing the spine off.

And it is very satisfying having everything on a hard drive, fully backed up, fully indexed. Or so I believe -- I haven't gotten through my stack yet.

--Barry



On 1 May 2016, at 22:34, Arlo Barnes wrote:

We have talked a little on this list about related topics, but I figured I
would ask people's opinions outright.

I have about 3 cubic meters of assorted paper documents -- and by assorted
I mean both unsorted into categories, but also of various types.
For example, there are papers that are unimportant that should be set aside
for disposal. There are papers of mild interest that should be kept if
possible (in a digital form, as their physical presence has no value beyond the contained information, and negative value in space taken up and mental clutter added). There are documents that should be digitized, but cannot be
disposed of as their physical form is important to their existence
(certificates for instance). Some of the information in the documents is sensitive, and since it is mixed in, the whole pile should be treated as
such (although there is not nothing that could not be shown to a
well-trusted entity). And the papers are not all of the same size or stock;
some of them are loose, some pamphlets, brochures, or even slim books.

Once they are digitized they will also need to be semanticized and related
to one another to start to make sense of it.
So, how should I go about this? Would mechanisation of some form help? Can
this even reasonably be done by one person?

-Arlo James Barnes
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com

Reply via email to