On May 16, 1:13 pm, Victor Kryukov <[EMAIL PROTECTED]> wrote: > Hello list, > > I've found the following strange behavior of cPickle. Do you think > it's a bug, or is it by design? > > Best regards, > Victor. > > from pickle import dumps > from cPickle import dumps as cdumps > > print dumps('1001799')==dumps(str(1001799)) > print cdumps('1001799')==cdumps(str(1001799)) > > outputs > > True > False > > vicbook:~ victor$ python > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin > Type "help", "copyright", "credits" or "license" for more information.>>> > quit() > > vicbook:~ victor$ uname -a > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00 > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
I might have found the culprit: see http://svn.python.org/projects/python/trunk/Modules/cPickle.c Function static int put2(...) has the following code block in it : ---------cPickle.c----------- int p; ... if ((p = PyDict_Size(self->memo)) < 0) goto finally; /* Make sure memo keys are positive! */ /* XXX Why? * XXX And does "positive" really mean non-negative? * XXX pickle.py starts with PUT index 0, not 1. This makes for * XXX gratuitous differences between the pickling modules. */ p++; ------------------------------- p++ will cause the difference. It seems the developers are not quite sure why it's there or whether memo key sizes can be 0 or have to be 1. Here is corresponding section for the Python version (pickle.py) taken from Python 2.5 ---------pickle.py---------- def memoize(self, obj): """Store an object in the memo.""" # The Pickler memo is a dictionary mapping object ids to 2- tuples # that contain the Unpickler memo key and the object being memoized. # The memo key is written to the pickle and will become # the key in the Unpickler's memo. The object is stored in the # Pickler memo so that transient objects are kept alive during # pickling. # The use of the Unpickler memo length as the memo key is just a # convention. The only requirement is that the memo values be unique. # But there appears no advantage to any other scheme, and this # scheme allows the Unpickler memo to be implemented as a plain (but # growable) array, indexed by memo key. if self.fast: return assert id(obj) not in self.memo memo_len = len(self.memo) self.write(self.put(memo_len)) self.memo[id(obj)] = memo_len, obj # Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument i. def put(self, i, pack=struct.pack): if self.bin: if i < 256: return BINPUT + chr(i) else: return LONG_BINPUT + pack("<i", i) return PUT + repr(i) + '\n' ------------------------------------------ In memoize memo_len is the 'int p' from the c version. The size is 0 and is kept 0 while in the C version the size initially is 0 but then is incremented with p++; Any developers that know more about this? -Nick Vatamaniuc -- http://mail.python.org/mailman/listinfo/python-list