Re: Long-term archiving of electronic text documents

Jim Breen Mon, 28 Jan 2013 16:08:11 -0800

William_J_G Overington <[email protected]> wrote:

> The idea is that there would be an additional UTF format, perhaps UTF-64,
> so that each character would be expressed in UTF-64 notation using 64 bits,
> thus providing error checking and correction facilities at a character level.


Error detection and correction at the character level is considered
very old-fashioned now. Modern techniques such as Reed-Solomon
codes[1] are much more effective and involve much less overhead
than the 100% in the proposal above. Such techniques are already
used in modern disc storage[2], and when combined with RAID
techniques[3] provide better data protection than character-level
redundancy ever would.

In any case, I think issues of error detection and correction are
quite outside the scope of Unicode.

Cheers

Jim

[1] http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
[2] http://en.wikipedia.org/wiki/Error_detection_and_correction#Data_storage
[3] http://en.wikipedia.org/wiki/RAID

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University

Re: Long-term archiving of electronic text documents

Reply via email to