Ok, I've attached the proto PEP below. Comments on the proto PEP and the implementation are appreciated.
Sw. Title: Secure, standard serialization of simple python types. Abstract This PEP suggests the addition of a module to the standard library, which provides a serialization class for simple Python types. Copyright This document is placed in the public domain. Motivation The standard library currently provides two modules which are used for object serialization. Pickle is not secure by its very nature, and the marshal module is clearly marked as being not secure in the documentation. The marshal module does not guarantee compatibility between Python versions. The proposed module will only serialize simple built-in Python types, and provide compatibility across Python versions. See RFE 467384 (on SourceForge) for more discussion on the above issues. Specification The proposed module should use the same API as the marshal module. dump(value, file) #serialize value, and write to open file object load(file) #read data from file object, unserialize and return an object dumps(value) #return the string that would be written to the file by dump loads(value) #unserialize and return object Reference Implementation http://metaplay.dyndns.org:82/~simon/gherkin.py.txt Rationale The marshal documentation explicitly states that it is unsuitable for unmarshalling untrusted data. It also explicitly states that the format is not compatible across Python versions. Pickle is compatible across versions, but also unsafe for loading untrusted data. Exploits demonstrating pickle vulnerability exist. xmlrpclib provides serialization functions, but is unsuitable when serializing large data structures, or when high performance is a requirement. If performance is an issue, a C-based accelerator module can be installed. If size is an issue, gzip can be used, however, this creates a mutually exclusive size/performance trade-off. Other existing formats, such as JSON and Bencode (bittorrent) do not handle some marginally complex python structures and/or all the simple Python types. Time and space efficiency, and security do not have to be mutually exclusive features of a serializer. Python does not provide, in the standard library, a serializer which can work safely with untrusted data which is time and space efficient. The proposed gherkin module goes some way to achieving this. The format is simple enough to easily write interoperable implementations across platforms. -- http://mail.python.org/mailman/listinfo/python-list