Casey Jones wrote: > That's a clever way of dramatically increasing the "uniqueness" of each > character to reduce the ambiguity of the OCR. It would be useful for > both error detection and error correction. If it could be integrated > into the OCR engine itself, it would be even more effective. Although > Gallager or Turbo Codes would give much better error correction for a > given storage space, your method would be way easier to implement. > > I'm leaning strongly against base64. There are just too many characters > that can be easily confused. Base32 would be nearly as dense (5 bits > instead of 6, per char) and would allow many tough characters to be left > out. A simple conversion chart for base32 chars could take up just one > line at the bottom of the page. The conversion to base32 and back would > be very easy. Selecting the unambiguous 32 characters to use as the > symbol set would require some care. Maybe some testing to find out which > symbols the OCR programs get wrong most often.
Information density isn't the goal here. My general strategy, to lay out my context, is to encrypt my big .tar nightlies and offsite them--the survivability of the media the big stuff is on is effectively someone else's problem. (Not perfect, but good enough, and if you keep everything redundant, there's no real issue.) But you can't reasonably offsite the private key in the same way...otherwise, how do you open everything when the time comes? Via the system I've concocted, secring.gpg can be printed in under 300 lines. I peg that at around 4 one-sided pages of recoverable text--a small price to pay to maintain control of a key. Actually, the draw of this idea as far as I'm concerned is that it's highly translucent: I'm very interested in ideas like PDF417 and QR, but there's a lot of support software involved that might not be so readily available--or compilable--in a pinch. Base64, on the other hand, fits in my head with very little effort. This means that, even in the outright absence of software that will actually handle base64, I could MacGyver something up without too much trouble in nearly any programming language that makes sense (I'm generally YAPH, but I've been messing with awk a lot lately, considering that it's ubiquitous on any platform with an X in its name. But b64 is simple enough to do in C, or even VB if you must, or perhaps INTERCAL/brainf*ck/... if you enjoy an insane challenge). It must be noted that there's often a much easier way, though--base64 can be jimmied into a .eml-format file by using a mail client to create an e-mail with a dummy attachment, then changing the contents with a text editor and re-opening. (This trick has actually gotten me through some jams before!) In this way, it helps that base64 also happens to be extremely ubiquitous; there's almost doubtlessly an implementation already on your machine. Getting base64 data into a machine isn't trivial, but it can again be argued that you have most or all of what you need at any workstation (unless you're blind, but even then it's not out of the question). Barcodes and data matrix standards may wax and wane, but we can hopefully agree that OCR isn't leaving anytime soon. Besides, even if by some freak accident OCR were to drop off the face of the Earth, there are still human eyes, human minds, and possibly even administrative assistants willing to take dictation. ;-) The translated digraph base64 in the third column would probably be easy enough to figure out even without the translation key via some simple "cryptanalysis" (I'm not suggesting the tr step is a cipher, but it does act like one); if the message is clear enough to be human readable, it itself would provide a more or less complete ciphertext-to-plaintext mapping. I haven't done a great deal of research into how valuable the mapping I chose (tr 'A-Za-z0-9+/=' '0-9A-Z+/=a-z') actually is, but it's not an entirely random choice. In particular, it makes sure that A-Z and a-z aren't adjacent, so that, for example, S and s don't map to an equivalently similar upper/lower-case pair. It probably merits more investigation, but I want to implement the original thing first and do some live testing to verify that there's even any problem to correct. Probably the only complicated part is the CRC-24; you might have to be just slightly hardcore just to memorize the XORing polynomial involved (though the rest isn't that hard; I'm just not the "digits of pi" type). But that's mostly a tool for auto-correction anyway; you could get a long way with just the first and third columns. By the way, last night I decided to try to implement CRC-24 in awk. It seems to have worked. It's not terribly efficient; I tried to stick to POSIX rules for portability, and POSIX awk has no XOR operator. Implementing XOR using substr is a rather humorous farce, I must say... So, long and short, stay tuned. I'm close to a first implementation and test messages will be passed. :-) Thanks PSM
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Gnupg-users mailing list Gnupg-users@gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-users