Re: [Python-Dev] Eliminating loops

2006-07-29 Thread Josiah Carlson

"Charles Vaughn" <[EMAIL PROTECTED]> wrote:
> I'm looking for a way of modifying the compiler to eliminate any loops and
> recursion from code.  It's for a high speed data processing application.
> The alternative is a custom language that is little more than gloryfied
> assembly.  I'd like to be able to use everything else around Python, but we
> can't allow the users to create more than O(1) complexity.

One of the larger, if not largest advances in computer science in the
last 50 years was the design and implementation of looping as a
construct, not just an artifact of structured gotos, but as a method of
implementing algorithms.

With your proposed removal of loops and recursion, you are effectively
saying that users should be given a turing machine because you are
afraid of them doing foolish things with the language.  Well, since the
user is going to do foolish things with the language anyways, about all
you can really do is to test, analyze, and verify.  Oh, and educate.

What do I mean?  If your users have access to sequences of any type, and
they can't perform 'x in y', then they are going to write their own
contains/index function...

def index(x,y):
if x == y[0]:
return 0
elif x == y[1]:
return 1
elif x == y[2]:
return 2
...

If they aren't given acess to sequences, then they will do what used to
be done in QuakeC back in the day...

def index(x,y):
if x == y0:
return 0
elif x == y1:
return 1
elif x == y2:
return 2
...


While algorithm design and analysis isn't something that everyone can do
(some just can't handle the math), the users really should understand at
least a bit about algorithms before they work on the "high speed data
processing" application.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Using Python docs

2006-07-29 Thread Georg Brandl

Regarding bug 469773, I think it would be great to have such a
document "Using Python", containing the manual page and platform-
specific hints on how to invoke the interpreter and scripts
(e.g. explaining the shebang for Unices).

I'd be willing to help write up such a document.

Another thing that could be helpful is a list of "frequently needed
documentation sections", that is, a list of keywords and respective
links to topics that are hard to find for newbies, such as the section
"String formatting" or some locations in the reference manual.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Armin Rigo
Hi Guido,

On Fri, Jul 28, 2006 at 11:31:09AM -0700, Guido van Rossum wrote:
> No time to look through the code here, but IMO it's acceptable (at
> least for 2.5) if (2**100).__index__() raises OverflowError, as long
> as x[:2**100] silently clips. __index__() is primarily meant to return
> a value useful for indexing concrete sequences, and 2**100 isn't.

If nb_index keeps returning a Py_ssize_t with clipping, it means that
there is no way to write in pure Python an object that emulates a long
-- only an int.  Sounds inconsistent with the int/long unification trend
for pure Python code.  It would make it awkward to write, say, pure
Python classes that pretend to be very large sequences, because using
__index__ in such code wouldn't work.

Another example of this is that if places like sequence_repeat are made
to use the following pseudo-logic:

if isinstance(w, long) and w > sys.maxint:
raise OverflowError
else:
i = w.__index__()

then if an object 'l' is an emulated pseudo-long, then  "x"*l  will
still silently clip the pseudo-long to sys.maxint.

I'm more in favor of changing nb_index to return a PyObject *, since now
is our last chance to do so.  A pair of API functions can be added to
return a Py_ssize_t with either the proper clipping, or the proper
OverflowError'ing.


A bientot,

Armin.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Armin Rigo wrote:
> Hi,
> 
> There is an oversight in the design of __index__() that only just
> surfaced :-(  It is responsible for the following behavior, on a 32-bit
> machine with >= 2GB of RAM:
> 
> >>> s = 'x' * (2**100)   # works!
> >>> len(s)
> 2147483647
> 
> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
> order to call v->sq_repeat.  However, __index__ is defined to clip the
> result to fit in a Py_ssize_t.  This means that the above problem exists
> with all sequences, not just strings, given enough RAM to create such
> sequences with 2147483647 items.
> 
> For reference, in 2.4 we correctly get an OverflowError.
> 
> Argh!  What should be done about it?

I've now got a patch on SF that aims to fix this properly [1].

The gist of the patch:

1. Redesign the PyNumber_Index C API to serve the actual use cases in the 
interpreter core and the standard library.

   The PEP 357 abstract C API as written was bypassed by nearly all of the 
uses in the core and the standard library. The patch redesigns that API to 
reduce code duplication between the various parts of the code base that were 
previously calling nb_index directly.

   The principal change is to provide an "is_index" output variable that the 
various mp_subscript implementations can use to determine whether or not the 
passed in object was an index or not, rather than having to repeat the type 
check everywhere. The rationale for doing it this way:
   a. you may want to try something else (e.g. the mp_subscript 
implementations in the standard library try indexing before checking to see if 
the object is a slice object)
   b. a different error message may be wanted (e.g. the normal indexing 
related Type Error doesn't make sense for sequence repetition)
   c. a separate checking function would lead to repeating the check on common 
code paths (e.g. if an mp_subscript implementation did the type check first, 
and then PyNumber_Check did it again to see whether or not to raise an error)

   The output variable solves the problem nicely: either pass in NULL to get 
the default behaviour of raising a sequence indexing TypeError, or pass in a 
pointer to a C int in order to be told whether or not the typecheck succeeded 
without an exception actually being set if it fails (if the typecheck passes, 
but the actual call fails, the exception state is set as normal).

   Additionally, PyNumber_Index is redefined to raise an IndexError for values 
that cannot be represented as a Py_ssize_t. The choice of IndexError was made 
based on the dominant usage in the standard library (IndexError is the correct 
error to raise so that an mp_subscript implementation does the right thing). 
There are only a few places that need to override the IndexError to replace it 
with OverflowError (the length argument to slice.indices, sequence repetition, 
the mmap constructor), whereas all of the sequence objects (list, tuple, 
string, unicode, array), as well as PyObject_Get/Set/DelItem, need it to raise 
IndexError.

   Raising IndexError also benefits sequences implemented in Python, which can 
simply do:

   def __getitem__(self, idx):
  if isinstance(idx, slice):
  return self._get_slice(idx)
  idx = operator.index(idx) # Will raise IndexError on overflow

   A second API function PyNumber_SliceIndex is added so that the clipping 
semantics are still available where needed and _PyEval_SliceIndex is modified 
to use the new public API. This is exposed to Python code as 
operator.sliceindex().

   With the redesigned C API, the *only* code that calls the nb_index slot 
directly is the two functions in abstract.c. Everything else uses one or the 
other of those interfaces. Code duplication was significantly reduced as a 
result, and it should be much easier for 3rd party C libraries to do what they 
need to do (i.e. implementing mp_subscript slots).

2. Redefine nb_index to return a PyObject *

   Returning the PyInt/PyLong objects directly from nb_index greatly 
simplified the implementation of the nb_index methods for the affected 
classes. For classic classes, instance_index could be modified to simply 
return the result of calling __index__, as could slot_nb_index for new-style 
classes. For the standard library classes, the existing int_int method, and 
the long_long method could be used instead of needing new functions.

   This convenience should hold true for extension classes - instead of 
needing to implement __index__ separately, they should be able to reuse their 
existing __int__ or __long__ implementations.

   The other benefit is that the logic to downconvert to Py_ssize_t that was 
formerly invoked by long's __index__ method is now instead invoked by 
PyNumber_Index and PyNumber_SliceIndex. This means that directly calling an 
__index__() method allows large long results to be passed through unaffected, 
but calling the indexing operator will raise IndexError if the long is outside 
the memo

Re: [Python-Dev] New miniconf module

2006-07-29 Thread David Hopwood
Nick Coghlan wrote:
> Sylvain Fourmanoit wrote:
>>Armin Rigo wrote:
>>
>>>If it goes in that direction, I'd suggest to rename the module to give
>>>it a name closer to existing persistence-related modules already in the
>>>stdlib.
>>
>>I am not especially fond of the current miniconf name either; I didn't 
>>find something more suitable, yet evocative of what it does; I would be 
>>glad to hear any suggestion you or the rest of the developers would have.
> 
> pyson :)

Following the pattern of JSON, it would be "PYON" (PYthon Object Notation).

-- 
David Hopwood <[EMAIL PROTECTED]>


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Nick Coghlan wrote:
>   The other benefit is that the logic to downconvert to Py_ssize_t that 
> was formerly invoked by long's __index__ method is now instead invoked 
> by PyNumber_Index and PyNumber_SliceIndex. This means that directly 
> calling an __index__() method allows large long results to be passed 
> through unaffected, but calling the indexing operator will raise 
> IndexError if the long is outside the memory address space:
> 
>   (2 ** 100).__index__() == (2**100)  # This works
>   operator.index(2**100)  # This raises IndexError
> 
> The patch includes additions to test_index.py to cover these limit 
> cases, as well as the necessary updates to the C API and operator module 
> documentation.

I forgot to mention the main benefit of this: when working with a 
pseudo-sequence rather than a concrete one, __index__() can be used directly 
to ensure you are working with integral data types while still allowing access 
to the full range of representable integer values.

operator.index is available for when what you have really is a concrete data 
set that is limited to the memory capacity of a single machine, and 
operator.sliceindex for when you want to clamp at the memory address space 
limits rather than raising an exception.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Tony Nelson
I'm trying to write a test for my Socket Timeouts patch [1], which fixes
signal handling (notably Ctl-C == SIGINT == KeyboarInterrupt) on socket
operations using a timeout.  I don't see a portable way to send a signal,
and asking the test runner to press Ctl-C is a non-starter.  A "real"
signal is needed to interrupt the select() (or equivalent) call, because
that's what wasn't being handled correctly.  The bug should happen on the
other platforms I don't know how to test on.

Is there a portable way to send a signal?  SIGINT would be best, but
another signal (such as SIGALRM) would do, I think.

If not, should I write the test to only work on systems implementing
SIGALRM, the signal I'm using now, or implementing kill(), or what?

[1] 

TonyN.:'   
  '  
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bad interaction of __index__ and sequence repeat

2006-07-29 Thread Nick Coghlan
Nick Coghlan wrote:
> Armin Rigo wrote:
>> Hi,
>>
>> There is an oversight in the design of __index__() that only just
>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>>
>> >>> s = 'x' * (2**100)   # works!
>> >>> len(s)
>> 2147483647
>>
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t.  This means that the above problem exists
>> with all sequences, not just strings, given enough RAM to create such
>> sequences with 2147483647 items.
>>
>> For reference, in 2.4 we correctly get an OverflowError.
>>
>> Argh!  What should be done about it?
> 
> I've now got a patch on SF that aims to fix this properly [1].

I revised this patch to further reduce the code duplication associated with 
the indexing code in the standard library.

The patch now has three new functions in the abstract C API:

   PyNumber_Index (used in a dozen or so places)
 - raises IndexError on overflow
   PyNumber_AsSsize_t (used in 3 places)
 - raises OverflowError on overflow
   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
 - clips to PY_SSIZE_T_MIN/MAX on overflow

All 3 have an int * output argument allowing type errors to be flagged 
directly to the caller rather than through PyErr_Occurred().

Of the 3, only PyNumber_Index is exposed through the operator module.

Probably the most interesting thing now would be for Travis to review it, and 
see whether it makes things easier to handle for the Numeric scalar types 
(given the amount of code the patch deleted from the builtin and standard 
library data types, hopefully the benefits to Numeric will be comparable).

Cheers,
Nick.

[1] http://www.python.org/sf/1530738


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] test_uuid

2006-07-29 Thread Neal Norwitz
Ping,

I just checked in a change to disable testing 2 uuid functions
(_ifconfig_get_node and unixdll_getnode) that fail on many platforms.
Here's the message:

"""
Disable these tests until they are reliable across platforms. These
problems may mask more important, real problems.

One or both methods are known to fail on: Solaris, OpenBSD, Debian, Ubuntu.
They pass on Windows and some Linux boxes.
"""

Can you fix these issues or at least provide guidance how they should be fixed?

n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] uuid test suite failing

2006-07-29 Thread Ka-Ping Yee
On Fri, 28 Jul 2006, Neal Norwitz wrote:
> This only fixes 1 of the 2 failures in test_uuid.  The other one is
> due to _unixdll_getnode() failing.  This is because
> _uuid_generate_time is None because we couldn't find it in the uuid
> library.  This is just broken, not sure if it's the code or the test
> though.  We should handle the case if _uuid_generate_time and the
> others are None better.  I don't know what to do in this case.

The design intention is this: each of the various *_getnode() functions
is supposed to work on the platform for which it was written.  For
example, _windll_getnode() is supposed to work on Windows, and will
raise an exception on other platforms; if it raises an exception on
Windows, something is wrong (the code's expectations of the OS are
not met).  When uuid_generate_time is unavailable, _unixdll_getnode()
is supposed to fail.

The getnode() function is just supposed to get an available MAC
address; that's why it catches any exceptions raised by the
*_getnode() functions.


-- ?!ng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_uuid

2006-07-29 Thread Ka-Ping Yee
On Sat, 29 Jul 2006, Neal Norwitz wrote:
> I just checked in a change to disable testing 2 uuid functions
> (_ifconfig_get_node and unixdll_getnode) that fail on many platforms.
> Here's the message:
>
> """
> Disable these tests until they are reliable across platforms. These
> problems may mask more important, real problems.
>
> One or both methods are known to fail on: Solaris, OpenBSD, Debian, Ubuntu.
> They pass on Windows and some Linux boxes.
> """

_ifconfig_get_node() should work on all Linuxes.  (Thanks for fixing
it to work on more types of Unix.)

It's okay for unixdll_getnode to fail when the necessary shared library
is not available.

Ideally, test_uuid should serve as a record of which platforms we expect
these routines to work on.  The uuid module as a whole isn't broken if
one of these routines fails; it just means that we don't have complete
platform coverage and/or test_uuid has inaccurate expectations of which
routines work on which platforms.


-- ?!ng
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Josiah Carlson

Tony Nelson <[EMAIL PROTECTED]> wrote:
> 
> I'm trying to write a test for my Socket Timeouts patch [1], which fixes
> signal handling (notably Ctl-C == SIGINT == KeyboarInterrupt) on socket
> operations using a timeout.  I don't see a portable way to send a signal,
> and asking the test runner to press Ctl-C is a non-starter.  A "real"
> signal is needed to interrupt the select() (or equivalent) call, because
> that's what wasn't being handled correctly.  The bug should happen on the
> other platforms I don't know how to test on.
> 
> Is there a portable way to send a signal?  SIGINT would be best, but
> another signal (such as SIGALRM) would do, I think.

According to my (limited) research on signals, Windows signal support is
horrible.  I have not been able to have Python send signals of any kind
other than SIGABRT, and then only to the currently running process,
which kills it (regardless of whether you have a signal handler or not).

> If not, should I write the test to only work on systems implementing
> SIGALRM, the signal I'm using now, or implementing kill(), or what?

I think that most non-Windows platforms should have non-braindead signal
support, though the signal module seems to be severely lacking in
sending any signal except for SIGALRM, and the os module has its fingers
on SIGABRT.


If someone is looking for a project for 2.6 that digs into all sorts of
platform-specific nastiness, they could add actual signal sending to the
signal module (at least for unix systems).

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Jean-Paul Calderone
On Sat, 29 Jul 2006 14:38:38 -0700, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
>If someone is looking for a project for 2.6 that digs into all sorts of
>platform-specific nastiness, they could add actual signal sending to the
>signal module (at least for unix systems).
>

Maybe I am missing something obvious, but what is necessary beyond
os.kill()?

What /would/ be useful is a complete sigaction wrapper, but that's a
completely separate topic.

Jean-Paul
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Testing Socket Timeouts patch 1519025

2006-07-29 Thread Tony Nelson
At 2:38 PM -0700 7/29/06, Josiah Carlson wrote:
>Tony Nelson <[EMAIL PROTECTED]> wrote:
>>
>> I'm trying to write a test for my Socket Timeouts patch [1], which fixes
>> signal handling (notably Ctl-C == SIGINT == KeyboarInterrupt) on socket
>> operations using a timeout.  I don't see a portable way to send a signal,
>> and asking the test runner to press Ctl-C is a non-starter.  A "real"
>> signal is needed to interrupt the select() (or equivalent) call, because
>> that's what wasn't being handled correctly.  The bug should happen on the
>> other platforms I don't know how to test on.
>>
>> Is there a portable way to send a signal?  SIGINT would be best, but
>> another signal (such as SIGALRM) would do, I think.
>
>According to my (limited) research on signals, Windows signal support is
>horrible.  I have not been able to have Python send signals of any kind
>other than SIGABRT, and then only to the currently running process,
>which kills it (regardless of whether you have a signal handler or not).

Hmm, OK, darn, thanks.  MSWindows does allow users to press Ctl-C to send a
KeyboardInterrupt, so it's just too bad if I can't find a way to test it
from a script.


>> If not, should I write the test to only work on systems implementing
>> SIGALRM, the signal I'm using now, or implementing kill(), or what?
>
>I think that most non-Windows platforms should have non-braindead signal
>support, though the signal module seems to be severely lacking in
>sending any signal except for SIGALRM, and the os module has its fingers
>on SIGABRT.

The test now checks "hasattr(signal, 'alarm')" before proceeding, so at
least it won't die horribly.


>If someone is looking for a project for 2.6 that digs into all sorts of
>platform-specific nastiness, they could add actual signal sending to the
>signal module (at least for unix systems).

Isn't signal sending the province of kill (2) (or os.kill()) in python)?
Not that I know much about it.

BTW, I picked SIGALRM because I could do it all with one thread.  Reading
POSIX, ISTM that if I sent the signal from another thread, it would bounce
off that thread to the main thread during the call to kill(), at which
point I got the willies.  OTOH, if kill() is more widely available than
alarm(), I'll give it a try, but going by the docs, I'd say it isn't.

TonyN.:'   
  '  
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com