As part of my newly begun Ph.D studies, I am planning to extend Sage's functionality in error correcting codes. I have already had some conversations with David Joyner about this, and we agreed that before any major work should be undertaken, a general framework of the classes and functionality should be discussed here, as the form of the current code is not very scalable. Once we agree enough about someting, this discussion could spawn a number of more specific tracs for the different areas of improvement as well as the conforming of the current functionality to the decided framework. I apologise that already my initial proposal is rather lengthy, but it is difficult to be more concise without losing precision.
I am sure that this touches upon many general ideologic issues already thoroughly discussed and decided upon long ago, so I hope that many Sage veterans will join this discussion, and not only coding theorists. Let's begin by shortly explaining the field: error-correcting codes is about encoding a message into a longer string (a codeword), which is to be sent over some non-perfect communication channel that might corrupt a few of the ciphers of the stream; the point being that a receiver can correct the faulty ciphers of the original message by using the redundancy in the encoded stream, and from this retrieve the original message. We usually represent the strings of data as vectors over some finite field F, so the encoding is an injective function from F^k -> F^n, where n > k. Decoding is a bit more complex: usually, we wish to decode to the most likely candidate, assuming random errors. Whenever there are few errors, this leads to one possibility, but when there are many errors, it might be a list of possibilities (list decoding). This also reflects the fact that encoding is usually straightforward and cheap (just one algorithm), while decoding is complex and expensive (and several algorithms could be applicable). Most codes studied are linear codes, meaning that the codewords of the code is a linear subspace of F^n. Many of their properties are studied by looking directly at the basis matrix of this subspace, called the generator matrix of the code. There are numerous identified subfamilies of linear codes, many of which overlap in various important cases. On top of this, it would be nice if the framework supported generic various testing of the efficiency, speed, error-correcting capability etc. of a code and encoding/decoding algorithms. There are other concerns to be sure, but my initial suggestion for structuring this is as follows: A family of codes is represented by a class (e.g. LinearCode) with an appropriate constructor taking needed arguments (e.g. a generator matrix for linear codes). An instance of such a class defines all that is needed for defining (i.e. encoding) the specific member of the code family in question. It has various generic functions that all codes must have (e.g. k and n in the above notation, base_ring for the alphabet). It might have functions for calculating various properties of the code (e.g. minimum_distance or weight enumerator). It has an encode function taking a list of length k over the base ring, which will return the corresponding codeword. Such a class should also have a decode function implementing the fastest "standard" decoding algorithm (e.g. for Reed-Solomon codes, this could be the Berlekamp-Massey decoder). If more than one decoding algorithm has been implemented, it could be specified as an optional second argument which one to use. As mentioned, one particular family of codes might be completely contained in another family while partly overlapping with several others (for example, a Reed-Solomon code is always a linear code but only sometimes cyclic). This could be solved by multiple inheritance as well as a code class for each possible intersection between any two families of codes. However, even though they are nice to have handy, most of the time a sub-family does not need the specificities of its super-family (e.g. when working with a Reed-Solomon code, it is only rarely interesting to know its generator matrix), so time should not be wasted on computing these in all cases -- something which is hard to avoid and predict when using inheritance. Instead, I propose using a combination of direct conversion functions as well as the "Quack like a duck" type-ideology of Python. For example, the ReedSolomonCode class should not inherit from LinearCode, but has a function to_linear_code, which constructs and returns the appropriate LinearCode instance. Furthermore, for convenience, the ReedSolomonCode itself implements most of the functions usually queried (e.g. minimum_distance, which, coincidentally, is easily calculated for this Reed-Solomon codes), either by a sub-family specific reimplementation or by constructing the LinearCode in the background (and saving it for future queries) and forwarding the function call. This last part should be to such an extend that most functions expecting a LinearCode can take an instance of ReedSolomonCode instead -- e.g. a generic testing framework. This also rather elegantly extends to the case of overlapping families: a ReedSolomonCode has the functions is_cyclic and to_cyclic_code, where the latter raises an exception in the case that the particular Reed- Solomon code is not cyclic. As a last remark, this suggestion could also be extended to support some form of coercion between the classes, such that e.g. {{{ R = ReedSolomonCode(<appropriate constructor information>) R == R.to_linear_code() True }}} However, this also raises the question of equivalence between codes and should perhaps be deferred to a later discussion. So a code should be a class and a specific code an instance of one such. However, for simplicity -- and because more generality seems unnecessary --, I would keep the data strings as Python lists over the base ring (with no specific encapsulating classes). So some code's encode-function should take a list of the appropriate length and over the code's base_ring. This would probably call for a module for manipulating these lists as codewords (e.g. when decoding Reed-Solomon codes, one often goes back and forth between a list representation and a polynomial representation of the codewords) Continuing to the decoding concerns. The protocol for a decoding algorithm should be: return the one result in case of a unique- decoding algorithm; raise an exception (DecodingFailed) in case of total failure (possibly containing a best guess (usable e.g. in the case of systematic encoding) or other decoding-algorithm-specific information); and return a list of results in the case of successful list decoding. A decoder that might sometimes return a list should always do so for type safety, and a decoding algorithm should have a function is_list_decoder returning whether or not this is the case. Many decoders have to be initialised with a lot of precomputation. A different concern is that the same decoding algorithm might work for several families of codes (without one family being directly contained in another), or there might be variants of the algorithm for different families of codes which are so related that they could share a lot of code. One way to handle all of this -- keeping the above suggestion for code family representation in mind -- is for decoding algorithms to be implemented as classes themselves, which are then instantiated whenever some code's decode function is called. This also puts a nice structure on the source code (for example, the Guruswami-Sudan algorithm for both Reed-Solomon codes and general AG-codes can both reside in decoders.GuruswamiSudan if that proved favorable). The decode function of some code class would then simply be to instantiate the appropriate decoding algorithm -- if it had not already been instantiated and saved -- and use this object's decode function. I hope this conveys my idea on the framework. I look forward to your thoughts. Cheers, Johan -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org