> There are a couple factual inaccuracies on the site that I'd like to clear up
> first:
> Trivial benchmarks put cerealizer and banana/jelly on the same level as far
> as performance goes:
> $ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w",
> "o", "r", "l", "d", ".")]' 'dumps(L)'
> 1 loops, best of 3: 84.1 usec per loop
> $ python -m timeit -s 'from twisted.spread import banana, jelly; dumps =
> lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r",
> "l", "d", ".")]' 'dumps(L)'
> 1 loops, best of 3: 89.7 usec per loop
>
> This is with cBanana though, which has to be explicitly enabled and, of
> course, is written in C. So Cerealizer looks like it has the potential to do
> pretty well, performance-wise.
My personal benchmark was different; it was using a list with 2000
objects defined as following:
class O(object):
def __init__(self):
self.x = 1
self.s = "jiba"
self.o = None
with self.o referring to another O object. I think my benchmark,
although still very limited, is more representative since it involves
object, string, number and list.
See it there:
http://svn.gna.org/viewcvs/*checkout*/soya/trunk/cerealizer/test/test1.py?content-type=text%2Fplain&rev=31
The results are (using Psyco):
With old-style classes:
cerealizer
dumps in 0.0619530677795 s, 114914 bytes length
loads in 0.0313038825989 s
cPickle
dumps in 0.0301840305328 s, 116356 bytes length
loads in 0.023097038269 s
jelly + banana
dumps in 0.168012142181 s 169729 bytes length
loads in 1.82081913948 s
jelly + cBanana
dumps in 0.082946062088 s 169729 bytes length
loads in 0.15615987 s
With new-style classes:
cerealizer
dumps in 0.0575239658356 s, 114914 bytes length
loads in 0.028165102005 s
cPickle
dumps in 0.07634806633 s, 116428 bytes length
loads in 0.0278959274292 s
jelly + banana
dumps in 0.156242132187 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)
jelly + cBanana
dumps in 0.10772895813 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)
As you see, cPickle is about 2 times faster than cerealizer for
old-style classes, but cerealizer beats cPickle for new-style classes
(which makes sense since I have optimized it for new-style classes).
However, Jelly is far behind, even using cBanana, especially for
loading.
> You talked about _Tuple and _Dereference on the website as well. These are
> internal implementation details. jelly also supports extension types, by way
> of setUnjellyableForClass and similar functions.
The problem arises only when the extension type expects an attribute of
a specific class, e.g. (in Pyrex):
cdef class MyClass:
cdef MyClass other
The other attribute of MyClass can only contains a reference to an
instance of MyClass (or None). Thus it cannot be set to an instance of
_Dereference or _Tuple, even temporarily; doing other =
_Dereference(...) raises an exception.
I solve this problem in Cerealizer by doing a 2-pass object creation:
step 1, create all the objects; step 2, set all objects' states.
> As far as security goes, no obvious problems jump out at me, either
> from the API for from skimming the code. I think early-binding
> __new__, __getstate__, and __setstate__ may be going further than
> is necessary. If someone can find code to set attributes on classes
> in your process space, they can probably already do anything they
> want to your program and don't need to exploit security problems in
> your serializer.
I agree on that; however I prefer to be "over-secure" than "just as
secure as necessary" :-)
Thank you for your opinion!
I'm going to update my website.
Jiba
--
http://mail.python.org/mailman/listinfo/python-list