encoding of sys.argv ?

2006-10-23 Thread Jiba
Hi all,

I am desperately searching for the encoding of sys.argv.

I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. 
sys.getdefaultencoding() is "ascii" and sys.getfilesystemencoding() is "utf-8". 
However, sys.argv is neither in ASCII (since I can pass French accentuated 
character), nor in UTF-8. It seems to be encoded in "latin-1", but why ?

Jiba
-- 
http://mail.python.org/mailman/listinfo/python-list


acessing to the raw data of an email message

2006-11-26 Thread Jiba
Hi,

I'm using the email Python package for parsing mail and checking GPG signature.

The Message object doesn't store the raw message data. Message.as_string 
"rebuild" the whole message, for example it may gives:

Content-Disposition: attachment; filename=text.txt

whereas the original message was containing:

Content-Disposition: attachment;
 filename=text.txt

>From the RFC point of view, both are equivalent. However, when checking the 
>signature using GPG, there are not the same for GPG, and thus the check fails.

Does anyone have an idea ? I think it would be nice to let the parser add the 
raw message data in the Message object.

Thanks,
Jiba
-- 
http://mail.python.org/mailman/listinfo/python-list


Secure Pickle-like module

2006-05-25 Thread jiba
Hi all,

I'm currently working on a secure Pickle-like module, Cerealizer,
http://home.gna.org/oomadness/en/cerealizer/index.html
Cerealizer has a pickle-like interface (load, dump, __getstate__,
__setstate__,...), however it requires to register the class you want
to "cerealize", by calling cerealizer.register(YourClass).
Cerealizer doesn't import other modules (contrary to pickle), and the
only methods it may call are YourClass.__new__, YourClass.__getstate__
and YourClass.__setstate__ (Cerealizer keeps it own reference to these
three method, so as YourCall.__setstate__ = cracked_method is
harmless).
Thus, as long as __new__, __getstate__ and __setstate__ are not
dangerous, Cerealizer should be secure.

The performance are quite good and, with Psyco, it is about as fast as
cPickle. However, Cerealizer is written in less than 300 lines of
pure-Python code.

I would appreciate any comments, especially if there are some security
gurus here :-)

Jiba

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Secure Pickle-like module

2006-05-25 Thread jiba
> There are a couple factual inaccuracies on the site that I'd like to clear up 
> first:
> Trivial benchmarks put cerealizer and banana/jelly on the same level as far 
> as performance goes:
> $ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w", 
> "o", "r", "l", "d", ".")]' 'dumps(L)'
> 1 loops, best of 3: 84.1 usec per loop
> $ python -m timeit -s 'from twisted.spread import banana, jelly; dumps = 
> lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r", 
> "l", "d", ".")]' 'dumps(L)'
> 1 loops, best of 3: 89.7 usec per loop
>
> This is with cBanana though, which has to be explicitly enabled and, of 
> course, is written in C.  So Cerealizer looks like it has the potential to do 
> pretty well, performance-wise.

My personal benchmark was different; it was using a list with 2000
objects defined as following:

class O(object):
  def __init__(self):
self.x = 1
self.s = "jiba"
self.o = None

with self.o referring to another O object. I think my benchmark,
although still very limited, is more representative since it involves
object, string, number and list.

See it there:
http://svn.gna.org/viewcvs/*checkout*/soya/trunk/cerealizer/test/test1.py?content-type=text%2Fplain&rev=31

The results are (using Psyco):
With old-style classes:
cerealizer
dumps in 0.0619530677795 s, 114914 bytes length
loads in 0.0313038825989 s

cPickle
dumps in 0.0301840305328 s, 116356 bytes length
loads in 0.023097038269 s

jelly + banana
dumps in 0.168012142181 s 169729 bytes length
loads in 1.82081913948 s

jelly + cBanana
dumps in 0.082946062088 s 169729 bytes length
loads in 0.15615987 s

With new-style classes:
cerealizer
dumps in 0.0575239658356 s, 114914 bytes length
loads in 0.028165102005 s

cPickle
dumps in 0.07634806633 s, 116428 bytes length
loads in 0.0278959274292 s

jelly + banana
dumps in 0.156242132187 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)

jelly + cBanana
dumps in 0.10772895813 s 169729 bytes length
(TypeError; I didn't investigate this problem yet although it is
surely solvable)

As you see, cPickle is about 2 times faster than cerealizer for
old-style classes, but cerealizer beats cPickle for new-style classes
(which makes sense since I have optimized it for new-style classes).
However, Jelly is far behind, even using cBanana, especially for
loading.


> You talked about _Tuple and _Dereference on the website as well.  These are 
> internal implementation details. jelly also supports extension types, by way 
> of setUnjellyableForClass and similar functions.

The problem arises only when the extension type expects an attribute of
a specific class, e.g. (in Pyrex):

cdef class MyClass:
  cdef MyClass other

The other attribute of MyClass can only contains a reference to an
instance of MyClass (or None). Thus it cannot be set to an instance of
_Dereference or _Tuple, even temporarily; doing other =
_Dereference(...) raises an exception.

I solve this problem in Cerealizer by doing a 2-pass object creation:
step 1, create all the objects; step 2, set all objects' states.

> As far as security goes, no obvious problems jump out at me, either
> from the API for from skimming the code.  I think early-binding
> __new__, __getstate__, and __setstate__ may be going further than
> is necessary.  If someone can find code to set attributes on classes
> in your process space, they can probably already do anything they
> want to your program and don't need to exploit security problems in
> your serializer.

I agree on that; however I prefer to be "over-secure" than "just as
secure as necessary" :-)

Thank you for your opinion!
I'm going to update my website.
Jiba

-- 
http://mail.python.org/mailman/listinfo/python-list