As part of my newly begun Ph.D studies, I am planning to extend Sage's
functionality in error correcting codes. I have already had some
conversations with David Joyner about this, and we agreed that before
any major work should be undertaken, a general framework of the
classes and functionality should be discussed here, as the form of the
current code is not very scalable. Once we agree enough about
someting, this discussion could spawn a number of more specific tracs
for the different areas of improvement as well as the conforming of
the current functionality to the decided framework. I apologise that
already my initial proposal is rather lengthy, but it is difficult to
be more concise without losing precision.

I am sure that this touches upon many general ideologic issues already
thoroughly discussed and decided upon long ago, so I hope that many
Sage veterans will join this discussion, and not only coding
theorists.

Let's begin by shortly explaining the field: error-correcting codes is
about encoding a message into a longer string (a codeword), which is
to be sent over some non-perfect communication channel that might
corrupt a few of the ciphers of the stream; the point being that a
receiver can correct the faulty ciphers of the original message by
using the redundancy in the encoded stream, and from this retrieve the
original message. We usually represent the strings of data as vectors
over some finite field F, so the encoding is an injective function
from F^k -> F^n, where n > k. Decoding is a bit more complex: usually,
we wish to decode to the most likely candidate, assuming random
errors. Whenever there are few errors, this leads to one possibility,
but when there are many errors, it might be a list of possibilities
(list decoding). This also reflects the fact that encoding is usually
straightforward and cheap (just one algorithm), while decoding is
complex and expensive (and several algorithms could be applicable).

Most codes studied are linear codes, meaning that the codewords of the
code is a linear subspace of F^n. Many of their properties are studied
by looking directly at the basis matrix of this subspace, called the
generator matrix of the code. There are numerous identified
subfamilies of linear codes, many of which overlap in various
important cases.

On top of this, it would be nice if the framework supported generic
various testing of the efficiency, speed, error-correcting capability
etc. of a code and encoding/decoding algorithms.

There are other concerns to be sure, but my initial suggestion for
structuring this is as follows:

A family of codes is represented by a class (e.g. LinearCode) with an
appropriate constructor taking needed arguments (e.g. a generator
matrix for linear codes). An instance of such a class defines all that
is needed for defining (i.e. encoding) the specific member of the code
family in question. It has various generic functions that all codes
must have (e.g. k and n in the above notation, base_ring for the
alphabet). It might have functions for calculating various properties
of the code (e.g. minimum_distance or weight enumerator). It has an
encode function taking a list of length k over the base ring, which
will return the corresponding codeword.
Such a class should also have a decode function implementing the
fastest "standard" decoding algorithm (e.g. for Reed-Solomon codes,
this could be the Berlekamp-Massey decoder). If more than one decoding
algorithm has been implemented, it could be specified as an optional
second argument which one to use.

As mentioned, one particular family of codes might be completely
contained in another family while partly overlapping with several
others (for example, a Reed-Solomon code is always a linear code but
only sometimes cyclic). This could be solved by multiple inheritance
as well as a code class for each possible intersection between any two
families of codes. However, even though they are nice to have handy,
most of the time a sub-family does not need the specificities of its
super-family (e.g. when working with a Reed-Solomon code, it is only
rarely interesting to know its generator matrix), so time should not
be wasted on computing these in all cases -- something which is hard
to avoid and predict when using inheritance.
Instead, I propose using a combination of direct conversion functions
as well as the "Quack like a duck" type-ideology of Python. For
example, the ReedSolomonCode class should not inherit from LinearCode,
but has a function to_linear_code, which constructs and returns the
appropriate LinearCode instance. Furthermore, for convenience, the
ReedSolomonCode itself implements most of the functions usually
queried (e.g. minimum_distance, which, coincidentally, is easily
calculated for this Reed-Solomon codes), either by a sub-family
specific reimplementation or by constructing the LinearCode in the
background (and saving it for future queries) and forwarding the
function call. This last part should be to such an extend that most
functions expecting a LinearCode can take an instance of
ReedSolomonCode instead -- e.g. a generic testing framework. This also
rather elegantly extends to the case of overlapping families: a
ReedSolomonCode has the functions is_cyclic and to_cyclic_code, where
the latter raises an exception in the case that the particular Reed-
Solomon code is not cyclic.
As a last remark, this suggestion could also be extended to support
some form of coercion between the classes, such that e.g.
{{{
    R = ReedSolomonCode(<appropriate constructor information>)
    R == R.to_linear_code()
    True
}}}
However, this also raises the question of equivalence between codes
and should perhaps be deferred to a later discussion.

So a code should be a class and a specific code an instance of one
such. However, for simplicity -- and because more generality seems
unnecessary --, I would keep the data strings as Python lists over the
base ring (with no specific encapsulating classes). So some code's
encode-function should take a list of the appropriate length and over
the code's base_ring. This would probably call for a module for
manipulating these lists as codewords (e.g. when decoding Reed-Solomon
codes, one often goes back and forth between a list representation and
a polynomial representation of the codewords)

Continuing to the decoding concerns. The protocol for a decoding
algorithm should be: return the one result in case of a unique-
decoding algorithm; raise an exception (DecodingFailed) in case of
total failure (possibly containing a best guess (usable e.g. in the
case of systematic encoding) or other decoding-algorithm-specific
information); and return a list of results in the case of successful
list decoding. A decoder that might sometimes return a list should
always do so for type safety, and a decoding algorithm should have a
function is_list_decoder returning whether or not this is the case.

Many decoders have to be initialised with a lot of precomputation. A
different concern is that the same decoding algorithm might work for
several families of codes (without one family being directly contained
in another), or there might be variants of the algorithm for different
families of codes which are so related that they could share a lot of
code. One way to handle all of this -- keeping the above suggestion
for code family representation in mind -- is for decoding algorithms
to be implemented as classes themselves, which are then instantiated
whenever some code's decode function is called. This also puts a nice
structure on the source code (for example, the Guruswami-Sudan
algorithm for both Reed-Solomon codes and general AG-codes can both
reside in decoders.GuruswamiSudan if that proved favorable). The
decode function of some code class would then simply be to instantiate
the appropriate decoding algorithm -- if it had not already been
instantiated and saved -- and use this object's decode function.

I hope this conveys my idea on the framework. I look forward to your
thoughts.
Cheers,
Johan

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to