date:20110611

Re: [Python-Dev] Python jails

2011-06-11 Thread Victor Stinner


Le 11/06/2011 02:41, R. David Murray a écrit :

I haven't read through your post, but if you don't know about it I
suspect that you will be interested in the following:

 http://code.activestate.com/pypm/pysandbox/

I'm pretty sure Victor will be happy to have someone else interested in
this topic.
   

Yes, I am happy :-) The project URL is https://github.com/haypo/pysandbox/

Activestate page is wrong: pysanbox does support Python 3 (Python 2.5 - 
3.3).


pysandbox uses different policy depending on the problem. For example, 
whitelist for builtins, blacklist for object attributes. pysandbox is 
based on Tav's ideas.


The main idea of pysandbox is to execute untrusted in a new empty 
namespace, the untrusted namespace. Objects imported into this namespace 
are imported as proxies to get a read-only view of the Python namespace. 
Importing modules is protected by a whitelist (modules and symbols 
names). To protect the namespace, some introspection attributes are 
hidden like __subclasses__ or __self__. Performances are supposed to be 
close to a classic Python interpreter (I didn't run a benchmark, I don't 
really care).


An empty namespace is not enough to protect Python: pysandbox denies the 
execution of arbitrary bytecode, write files, write to stdout/stderr, 
exit Python, etc. Tav's sandbox is good to deny everything, whereas you 
can configure pysandbox to enable some features (e.g. exit Python, 
useful for an interpreter).


About restricted mode: you can also configure pysandbox to use it, but 
the restricted mode is too much restrictive: you cannot open files, 
whereas pysandbox allows to read files in a whitelist (e.g. useful to 
display a backtrace).


If you would like to implement your own sandbox: great! You should try 
pysandbox test suite, I'm proud of it :-)


I am still not sure that pysandbox approach is the good one: if you find 
a vulnerability to escape pysandbox "jail" (see pysandbox Changelog, it 
would not be the first time), you can do anything. PyPy sandbox and 
"Seccomp nurse" (specific to Linux?) use two processes: the Python 
process cannot do anything, it relies completly in a trusted process 
which control all operations. I don't understand exactly how it is 
different: a vulnerability in the trusted process gives also full 
control, but it's maybe a safer approach. Or the difference is maybe 
that the implementation is simpler (less code?) and so safer (less code 
usually means less bugs).


"Seccomp nurse":
http://chdir.org/~nico/seccomp-nurse/

I tested recently AppEngine sandbox (testable online via 
http://shell.appspot.com/): it is secure *and* powerful, quite all 
modules are allowed (except not ctypes, as expected). AppEngine is not a 
Python sandbox: it's a sandbox between the Python process and Linux 
kernel, so it protects also modules written in C (pysandbox is unable to 
protect these modules). AppEngine modifies the Python standard library 
to cooperate with the low-level sandbox, e.g. raise nice error messages 
with open(filename, "w"): invalid file mode (instead of an ugly OSError 
with a cryptic message).


Get more information about pysandbox and other sandboxes in pysandbox 
README file.


Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3101 implementation vs. documentation

2011-06-11 Thread Nick Coghlan

On Sat, Jun 11, 2011 at 7:15 AM, Ben Wolfson  wrote:
[snip very thorough analysis]

To summarise (after both the above post and the discussion on the tracker)

The current str.format implementation differs from the documentation
in two ways:

1. It ignores the presence of an unclosed index field when processing
a replacement field (placing additional restrictions on allowable
characters in index strings).
2. Replacement fields that appear in name specifiers are processed by
the parser for brace-matching purposes, but not substituted

More accurate documentation would state that:

1. Numeric name fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
'}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
'!' (terminates name field, starts conversion specifier)
':' (terminates name field, starts format specifier)
'.' (terminates current name field, starts new name field for subattribute)
'[' (terminates name field, starts index field)

3. An index field is terminated by one of:
'}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
'!' (terminates index field, starts conversion specifier)
':' (terminates index field, starts format specifier)
']' (terminates index field, subsequent character will determine next field)

This existing behaviour can certainly be documented as such, but is
rather unintuitive and (given that '}', '!' and ']' will always error
out if appearing in an index field) somewhat silly.

So, the two changes that I believe Ben is proposing would be as follows:

1. When processing a name field, brace-matching is suspended. Between
the opening '{' character and the closing '}', '!' or ':' character,
additional '{' characters are ignored for matching purposes.
2. When processing an index field, all special processing is suspended
until the terminating ']' is reached

The rules for name fields would then become:

1. Numeric fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
'}' (terminates the replacement field)
'!' (terminates identifier field, starts conversion specifier)
':' (terminates identifier field, starts format specifier)
'.' (terminates identifier field, starts new identifier field for
subattribute)
'[' (terminates identifier field, starts index field)

3. An index field is terminated by ']' (subsequent character will
determine next field)

That second set of rules is *far* more in line with the behaviour of
the rest of the language than the status quo, so unless the difficulty
of making the str.format mini-language parser work that way is truly
prohibitive, it certainly seems worthwhile to tidy up the semantics.

The index field behaviour should definitely be fixed, as it poses no
backwards compatibility concerns. The brace matching behaviour should
probably be left alone, as changing it would potentially break
currently valid format strings (e.g. "{a{0}}".format(**{'a{0}':1})
produces '1' now, but would raise an exception if the brace matching
rules were changed).

So +1 on making the str.format parser accept anything other than ']'
inside an index field and turn the whole thing into an ordinary
string, -1 on making any other changes to the brace-matching
behaviour.

That would leave us with the following set of rules for name fields:

1. Numeric fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
'}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
'!' (terminates identifier field, starts conversion specifier)
':' (terminates identifier field, starts format specifier)
'.' (terminates identifier field, starts new identifier field for
subattribute)
'[' (terminates identifier field, starts index field)

3. An index field is terminated by ']' (subsequent character will
determine next field)

Note that brace-escaping currently doesn't work inside name fields, so
that should also be fixed:

>>> "{0[{{]}".format({'{':1})
Traceback (most recent call last):
  File "", line 1, in 
ValueError: unmatched '{' in format
>>> "{a{{}".format(**{'a{':1})
Traceback (most recent call last):
  File "", line 1, in 
ValueError: unmatched '{' in format

As far as I can recall, the details of this question didn't come up
when PEP 3101 was developed, so the PEP isn't a particularly good
source to justify anything in relation to this - it is best to
consider the current behaviour to just be the way it happened to be
implemented rather than a deliberate design choice.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Austr

Re: [Python-Dev] PEP 3101 implementation vs. documentation

2011-06-11 Thread Petri Lehtinen

Nick Coghlan wrote:
[snip]
> The rules for name fields would then become:
> 
> 1. Numeric fields start with a digit and are terminated by any
> non-numeric character.
> 
> 2. An identifier name field is terminated by any one of:
> '}' (terminates the replacement field)
> '!' (terminates identifier field, starts conversion specifier)
> ':' (terminates identifier field, starts format specifier)
> '.' (terminates identifier field, starts new identifier field for
> subattribute)
> '[' (terminates identifier field, starts index field)
> 
> 3. An index field is terminated by ']' (subsequent character will
> determine next field)

+1

> That second set of rules is *far* more in line with the behaviour of
> the rest of the language than the status quo, so unless the difficulty
> of making the str.format mini-language parser work that way is truly
> prohibitive, it certainly seems worthwhile to tidy up the semantics.
> 
> The index field behaviour should definitely be fixed, as it poses no
> backwards compatibility concerns. The brace matching behaviour should
> probably be left alone, as changing it would potentially break
> currently valid format strings (e.g. "{a{0}}".format(**{'a{0}':1})
> produces '1' now, but would raise an exception if the brace matching
> rules were changed).

-1 for leaving the brace matching behavior alone, as it's very
unintuitive for *the user*. For the implementor it may make sense to
count matching braces, but definitely not for the user. I don't
believe that "{a{0}}" is a real use case that someone might already
use, as it's a hard violation of what the documentation currently
says.

I'd rather disallow braces in the replacement field before the format
specifier altogether. Or closing braces at the minimum. Furthermore,
the double-escaping sounds reasonable in the format specifier, but not
elsewhere.

My motivation is that the user should be able to have a quick glance
on the format string and see where the replacement fields are. This is
probably what the PEP intends to say when disallowing braces inside
the replacement field. In my opinion, it's easy to write the parser in
a way that braces are parsed in any imaginable manner. Or maybe not
easy, but not harder than any other way of handling braces.

> So +1 on making the str.format parser accept anything other than ']'
> inside an index field and turn the whole thing into an ordinary
> string, -1 on making any other changes to the brace-matching
> behaviour.
> 
> That would leave us with the following set of rules for name fields:
> 
> 1. Numeric fields start with a digit and are terminated by any
> non-numeric character.
> 
> 2. An identifier name field is terminated by any one of:
> '}' (terminates the replacement field, unless preceded by a
> matching '{' character, in which case it is ignored and included in
> the string)
> '!' (terminates identifier field, starts conversion specifier)
> ':' (terminates identifier field, starts format specifier)
> '.' (terminates identifier field, starts new identifier field for
> subattribute)
> '[' (terminates identifier field, starts index field)
> 
> 3. An index field is terminated by ']' (subsequent character will
> determine next field)
> 
> Note that brace-escaping currently doesn't work inside name fields, so
> that should also be fixed:
> 
> >>> "{0[{{]}".format({'{':1})
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: unmatched '{' in format
> >>> "{a{{}".format(**{'a{':1})
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: unmatched '{' in format

-1. Why do we need braces inside replacement fields at all (except for
inner replacements in the format specier)? I strongly believe that the
PEP's use case is the simple one:

'{foo}'.format(foo=10)

In my opinoin, these '{!#%}'.format(**{'!#%': 10}) cases are not real.
The current documentation requires field_name to be a valid
identifier, an this is a sane requirement. The only problem is that
parsing identifiers correctly is very hard, so it can be made simpler
by allowing some non-identifiers. But we still don't have to accept
braces.

---

As a somewhat another issue, I'm confused about this:

  >>> '{a[1][2]}'.format(a={1:{2:3}})
  '3'

and even more about this:

  >>> '{a[1].foo[2]}'.format(a={1:namedtuple('x', 'foo')({2:3})})
  '3'

Why does this work? It's against the current documentation. The
documented syntax only allows zero or one attribute names and zero or
one element index, in this order. Is it intentional that we allow
arbitrary chains of getattr and __getitem__? If we do, this should be
documented, too.

Petri
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3101 implementation vs. documentation

2011-06-11 Thread Ben Wolfson

On Sat, Jun 11, 2011 at 2:16 AM, Nick Coghlan  wrote:
> On Sat, Jun 11, 2011 at 7:15 AM, Ben Wolfson  wrote:
> To summarise (after both the above post and the discussion on the tracker)

Thanks for the summary!

>
> That would leave us with the following set of rules for name fields:
>
> 1. Numeric fields start with a digit and are terminated by any
> non-numeric character.
>
> 2. An identifier name field is terminated by any one of:
>    '}' (terminates the replacement field, unless preceded by a
> matching '{' character, in which case it is ignored and included in
> the string)
>    '!' (terminates identifier field, starts conversion specifier)
>    ':' (terminates identifier field, starts format specifier)
>    '.' (terminates identifier field, starts new identifier field for
> subattribute)
>    '[' (terminates identifier field, starts index field)
>
> 3. An index field is terminated by ']' (subsequent character will
> determine next field)

A minor clarification since I mentioned a patch: the patch as it
exists implements *these*---Nick's---semantics. That is, it will allow
these:

"{0.{a}}".format(x)
"{0.{[{].}}".format(x)

But not this, because it keeps current brace-matching in this context:

"{0.{a}".format(x)

And it treats this:

"{0.a}}".format(x)

as the markup "{0.a}" followed by the character data "}".

The patch would have to be changed to turn off brace balancing in name
fields as well.

In either case there would be potential breakage, since this:

"{0[{}.}}".format(...)

currently works, but would not work anymore, under either set of
rules. (The likelihood that this potential breakage would anywhere be
actual breakage is pretty slim, though.)

> Note that brace-escaping currently doesn't work inside name fields, so
> that should also be fixed:
>
 "{0[{{]}".format({'{':1})
> Traceback (most recent call last):
>  File "", line 1, in 
> ValueError: unmatched '{' in format
 "{a{{}".format(**{'a{':1})
> Traceback (most recent call last):
>  File "", line 1, in 
> ValueError: unmatched '{' in format

This is a slightly different issue, though, isn't it? As far as I can
tell, if the brace-matching rules are kept in place, there would never
be any *need* for escaping. You can't have an internal replacement
field in this part of the replacement field, so '{' can always safely
be assumed to be Just a Brace and not the start of a replacement
field, regardless of whether it's doubled, and '}' will either be in
an index field (where it can't have the significance of ending the
replacement field) or it will be (a) the end of the replacement field
or (b) not the end of the replacement field because matched by an
earlier '{'. So there would never be any role for escaping to play.

There would be a role for escaping if the rules for name fields are
that '}' terminates them, no matching done; then, you could double
them to get a '}' in the name field. But, to be honest, that strikes
me as introducing a lot of heavy machinery for very little gain;
opening and closing braces would have to be escaped to accomodate this
one thing. And it's not as if you can escape ']' in index
fields---which would be a parallel case. It seems significantly
simpler to me to leave the escaping behavior as it is in this part of
the replacement field.

-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython (3.1): onto 3.1.5

2011-06-11 Thread Benjamin Peterson

2011/6/11 Terry Reedy :
>
>> +What's New in Python 3.1.5?
>> +===
>> +
>> +*Release date: -XX-XX*
>> +
>> +Core and Builtins
>> +-
>> +
>> +Library
>> +---
>> +
>> +
>
> I presume that only security patches should be added.

Indeed.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3101 implementation vs. documentation

2011-06-11 Thread Terry Reedy


On 6/11/2011 6:32 AM, Petri Lehtinen wrote:

Nick Coghlan wrote:
[snip]


It seems to me that the intent of the pep and the current doc is that 
field_names should match what one would write in code except that quotes 
are left off of literal string keys. Which is to say, the brackets [] 
serve as quote marks. So once '[' is found, the scanner must shift to 
'in index' mode and accept everything until a matching ']' is found, 
ending 'in index' mode.


The arg_name is documented as int or identifier and attribute_name as 
identifier, period. Anything more than that is an implementation 
accident which people should not count on in either future versions or 
alternate implementations.


I can imagine uses for nested replacement fields in the field_name or 
conversion spec. Ie, '{ {0}:{1}d'.format(2,5,333,444) == '  333', 
whereas changing the first arg to 3 would produce '  444'. If braces are 
allowed in either of the first two segments (outside of the 'quoted' 
within braces context), I think it should only be for the purpose of a 
feature addition that makes them significant.


It strikes me that the underlying problem is that the replacement_field 
scanner is, apparently, hand-crafted rather than being generated from 
the corresponding grammar, as is the overall Python lexer-parser. So it 
has no necessary connection with the grammar.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3101 implementation vs. documentation

2011-06-11 Thread Greg Ewing


Ben Wolfson wrote:

You can't have an internal replacement
field in this part of the replacement field, so '{' can always safely
be assumed to be Just a Brace and not the start of a replacement
field, regardless of whether it's doubled,


I'm worried that the rules in this area are getting too
complicated for a human to follow. If braces are allowed
as plain data between square brackets and/or vice versa,
it's going to be a confusing mess to read, and there will
always be some doubt in the programmer's mind as to whether
they have to be escaped somehow or not.

I'm inclined to think that any such difficult cases should
simply be disallowed. If the docs say an identifier is required
someplace, the implementation should adhere strictly to that.

It's not *that* hard to parse an indentifier properly, and
IMO any use case that requires putting arbitrary characters
into an item selector is abusing the format mechanism and
should be redesigned to work some other way.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python jails

Re: [Python-Dev] PEP 3101 implementation vs. documentation

Re: [Python-Dev] PEP 3101 implementation vs. documentation

Re: [Python-Dev] PEP 3101 implementation vs. documentation

Re: [Python-Dev] [Python-checkins] cpython (3.1): onto 3.1.5

Re: [Python-Dev] PEP 3101 implementation vs. documentation

Re: [Python-Dev] PEP 3101 implementation vs. documentation

7 matches

Site Navigation

Mail list logo

Footer information