Re: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it

2005-02-16 Thread konrad . hinsen
On 10.02.2005, at 05:36, Guido van Rossum wrote:
And why would a Matrix need to inherit from a C-array? Wouldn't it
make more sense from an OO POV for the Matrix to *have* a C-array
without *being* one?
Definitely. Most array operations make no sense on matrices. And  
matrices are limited to two dimensions. Making Matrix a subclass of  
Array would be inheritance for implementation while removing 90% of the  
interface.

On the other hand, a Matrix object is perfectly defined by its  
behaviour and independent of its implementation. One could perfectly  
well implement one using Python lists or dictionaries, even though that  
would be pointless from a performance point of view.

Konrad.
--
 
---
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: [EMAIL PROTECTED]
 
---

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it

2005-02-16 Thread konrad . hinsen
On 10.02.2005, at 05:09, Travis Oliphant wrote:
I'm not sure I agree.  The ufuncobject is the only place where this  
concern existed (should we trip OverFlow, ZeroDivision, etc. errors  
durring array math).   Numarray introduced and implemented the concept  
of error modes that can be pushed and popped.  I believe this is the  
right solution for the ufuncobject.
Indeed. Note also that the ufunc stuff is less critical to agree on  
than the array data structure. Anyone unhappy with ufuncs could write  
their own module and use it instead. It would be the data structure and  
its access rules that fix the structure of all the code that uses it,  
so that's what needs to be acceptable to everyone.

One question we are pursuing is could the arrayobject get into the  
core without a particular ufunc object.   Most see this as  
sub-optimal, but maybe it is the only way.
Since all the artithmetic operations are in ufunc that would be  
suboptimal solution, but indeed still a workable one.

I appreciate some of what Paul is saying here, but I'm not fully  
convinced that this is still true with Python 2.2 and up new-style  
c-types.   The concerns seem to be over the fact that you have to  
re-implement everything in the sub-class because the base-class will  
always return one of its objects instead of a sub-class object.
I'd say that such discussions should be postponed until someone  
proposes a good use for subclassing arrays. Matrices are not one, in my  
opinion.

Konrad.
--
 
---
Konrad Hinsen
Laboratoire Leon Brillouin, CEA Saclay,
91191 Gif-sur-Yvette Cedex, France
Tel.: +33-1 69 08 79 25
Fax: +33-1 69 08 82 61
E-Mail: [EMAIL PROTECTED]
 
---

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it

2005-02-16 Thread Peter Verveer
On Feb 10, 2005, at 10:30 AM, Travis Oliphant wrote:

One question we are pursuing is could the arrayobject get into the  
core without a particular ufunc object.   Most see this as  
sub-optimal, but maybe it is the only way.

Since all the artithmetic operations are in ufunc that would be  
suboptimal solution, but indeed still a workable one.

I think replacing basic number operations of the arrayobject should 
simple, so perhaps a default ufunc object could be worked out for 
inclusion.
I agree, getting it in the core is among others, intended to give it 
broad access, not just to hard-core numeric people. For many uses 
(including many of my simpler scripts) you don't need the more exotic 
functionality of ufuncs. You could just do with implementing the 
standard math functions, possibly leaving out things like reduce. That 
would be very easy to implement.



I appreciate some of what Paul is saying here, but I'm not fully  
convinced that this is still true with Python 2.2 and up new-style  
c-types.   The concerns seem to be over the fact that you have to  
re-implement everything in the sub-class because the base-class will 
 always return one of its objects instead of a sub-class object.

I'd say that such discussions should be postponed until someone  
proposes a good use for subclassing arrays. Matrices are not one, in 
my  opinion.

Agreed.  It is is not critical to what I am doing, and I obviously 
need more understanding before tackling such things.  Numeric3 uses 
the new c-type largely because of the nice getsets table which is 
separate from the methods table.  This replaces the rather ugly 
C-functions getattr and setattr.
I would agree that sub-classing arrays might not be worth the trouble.
Peter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] RE: [Numpy-discussion] Numeric life as I see it

2005-02-16 Thread Perry Greenfield
Paul Dubois wrote:

>
> Aside: While I am at it, let me reiterate what I have said to the other
> developers privately: there is NO value to inheriting from the array
> class. Don't try to achieve that capability if it costs anything, even
> just effort, because it buys you nothing. Those of you who keep
> remarking on this as if it would simply haven't thought it through IMHO.
> It sounds so intellectually appealing that David Ascher and I had a
> version of Numeric that almost did it before we realized our folly.
>
To be contrarian, we did find great benefit (at least initially) for
inheritance for developing the record array and character array classes
since they share so many structural operations (indexing, slicing,
transposes,
concatenation, etc.) with numeric arrays. It's possible that the approach
that Travis is considering doesn't need to use inheritance to accomplish
this (I don't know enough about the details yet), but it sure did save a
lot of duplication of implementation.

I do understand what you are getting at. Any numerical array inheritance
generally forces one to reimplement all ufuncs and such, and that does
make it less useful in that case (though I still wonder if it still isn't
better than the alternatives)

Perry Greenfield


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Py2.3.1

2005-02-16 Thread apoline juliet obina
iis it "pydos" ? your net add?/
   
Yahoo! Messenger - Communicate instantly..."Ping" your friends 
today! Download Messenger Now___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%

2005-02-16 Thread Gfeller Martin
Dear all,

I'm running a large Zope application on a 1x1GHz CPU 1GB mem 
Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 
The application typically builds large lists by appending 
and extending them. 

We regularly observed that using a given functionality a 
second time using the same process was much slower (50%) 
than when it ran the first time after startup. 
This behavior greatly improved with Python 2.3 (thanks 
to the improved Python object allocator, I presume). 

Nevertheless, I tried to convert the heap used by Python 
to a Windows Low Fragmentation Heap (available on XP 
and 2003 Server). This improved the overall run time 
of a typical CPU-intensive report by about 15% 
(overall run time is in the 5 minutes range), with the
same memory consumption.

I consider 15% significant enough to let you know about it.

For information about the Low Fragmentation Heap, see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/low_fragmentation_heap.asp

Best regards,
Martin 

PS: Since I don't speak C, I used ctypes to convert all 
heaps in the process to LFH (I don't know how to determine
which one is the C heap).





COMIT AG
Risk Management Systems
Pflanzschulstrasse 7 
CH-8004 Zürich 

Telefon +41 (44) 1 298 92 84 

http://www.comit.ch 
http://www.quantax.com - Quantax Trading and Risk System

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] builtin_id() returns negative numbers

2005-02-16 Thread Richard Brodie
> Maybe it's just a wart we have to live with now; OTOH,
> the docs explicitly warn that id() may return a long, so any code
> relying on "short int"-ness has always been relying on an
> implementation quirk.

Well, the docs say that %x does unsigned conversion, so they've
been relying on an implementation quirk as well ;)

Would it be practical to add new conversion syntax to string 
interpolation? Like, for example, %p as an unsigned hex number
the same size as (void *). 

Otherwise, unless I misunderstand integer unification, one would
just have to strike the distinction between, say, %d and %u.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Michael Hudson
Nick Rasmussen <[EMAIL PROTECTED]> writes:

[five days ago]

> Should boost::python functions be modified in some way to show
> up as builtin function types or is the right fix really to patch
> pydoc?

My heart leans towards the latter.

> Is PyCFunction_Type intended to be subclassable?

Doesn't look like it, does it? :) More seriosly, "no".

Cheers,
mwh

-- 
  ARTHUR:  Don't ask me how it works or I'll start to whimper.
   -- The Hitch-Hikers Guide to the Galaxy, Episode 11
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Nick Rasmussen
tommy said that this would be the best place to ask
this question

I'm trying to get functions wrapped via boost to show
up as builtin types so that pydoc includes them when
documenting the module containing them.  Right now
boost python functions are created using a PyTypeObject
such that when inspect.isbuiltin does:

return isinstance(object, types.BuiltinFunctionType)

isintance returns 0.

Initially I had just modified a local pydoc to document all
functions with unknown source modules (since the module can't
be deduced from non-python functions), but I figured that
the right fix was to get boost::python functions to correctly
show up as builtins, so I tried setting PyCFunction_Type as the
boost function type object's tp_base, which worked fine for me
using linux on amd64, but when my patch was tried out on other
platforms, it ran into regression test failures:

http://mail.python.org/pipermail/c++-sig/2005-February/008545.html

So I have some questions:

Should boost::python functions be modified in some way to show
up as builtin function types or is the right fix really to patch
pydoc?

Is PyCFunction_Type intended to be subclassable?  I noticed that
it does not have Py_TPFLAGS_BASETYPE set in its tp_flags.  Also,
PyCFunction_Type has Py_TPFLAGS_HAVE_GC, and as the assertion failures
in the testsuite seemed to be centered around object allocation/
garbage collection, so is there something related to subclassing a
gc-aware class that needs to be happening (currently the boost type
object doesn't support garbage collection).

If subclassing PyCFunction_Type isn't the right way to make these
functions be considered as builtin functions, what is?

-nick


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Phillip J. Eby
At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote:
tommy said that this would be the best place to ask
this question
I'm trying to get functions wrapped via boost to show
up as builtin types so that pydoc includes them when
documenting the module containing them.  Right now
boost python functions are created using a PyTypeObject
such that when inspect.isbuiltin does:
return isinstance(object, types.BuiltinFunctionType)
FYI, this may not be the "right" way to do this, but since 2.3 
'isinstance()' looks at an object's __class__ rather than its type(), so 
you could perhaps include a '__class__' descriptor in your method type that 
returns BuiltinFunctionType and see if that works.

It's a kludge, but it might let your code work with existing versions of 
Python.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Bob Ippolito
On Feb 16, 2005, at 11:02, Phillip J. Eby wrote:
At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote:
tommy said that this would be the best place to ask
this question
I'm trying to get functions wrapped via boost to show
up as builtin types so that pydoc includes them when
documenting the module containing them.  Right now
boost python functions are created using a PyTypeObject
such that when inspect.isbuiltin does:
return isinstance(object, types.BuiltinFunctionType)
FYI, this may not be the "right" way to do this, but since 2.3 
'isinstance()' looks at an object's __class__ rather than its type(), 
so you could perhaps include a '__class__' descriptor in your method 
type that returns BuiltinFunctionType and see if that works.

It's a kludge, but it might let your code work with existing versions 
of Python.
It works in Python 2.3.0:
import types
class FakeBuiltin(object):
__doc__ = property(lambda self: self.doc)
__name__ = property(lambda self: self.name)
__self__ = property(lambda self: None)
__class__ = property(lambda self: types.BuiltinFunctionType)
def __init__(self, name, doc):
self.name = name
self.doc = doc
>>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval"))
Help on built-in function name:
name(...)
name(foo, bar, baz) -> rval
-bob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Phillip J. Eby
At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote:
>>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval"))
Help on built-in function name:
name(...)
name(foo, bar, baz) -> rval
If you wanted to be even more ambitious, you could return FunctionType and 
have a fake func_code so pydoc will be able to see the argument signature 
directly.  :)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subclassing PyCFunction_Type

2005-02-16 Thread Bob Ippolito
On Feb 16, 2005, at 11:43, Phillip J. Eby wrote:
At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote:
>>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval"))
Help on built-in function name:
name(...)
name(foo, bar, baz) -> rval
If you wanted to be even more ambitious, you could return FunctionType 
and have a fake func_code so pydoc will be able to see the argument 
signature directly.  :)
I was thinking that too, but I didn't have the energy to code it in an 
email :)

-bob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
any special reason why "in" is faster if the substring is found, but
a lot slower if it's not in there?

timeit -s "s = 'not there'*100" "s.find('not there') != -1"
100 loops, best of 3: 0.749 usec per loop

timeit -s "s = 'not there'*100" "'not there' in s"
1000 loops, best of 3: 0.122 usec per loop

timeit -s "s = 'not the xyz'*100" "s.find('not there') != -1"
10 loops, best of 3: 7.03 usec per loop

timeit -s "s = 'not the xyz'*100" "'not there' in s"
1 loops, best of 3: 25.9 usec per loop



ps. btw, it's about time we did something about this:

timeit -s "s = 'not the xyz'*100" -s "import re; p = re.compile('not there')" 
"p.search(s)"
10 loops, best of 3: 5.72 usec per loop 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Batista, Facundo
Title: RE: [Python-Dev] string find(substring) vs. substring in string





[Fredrik Lundh]


#- any special reason why "in" is faster if the substring is found, but
#- a lot slower if it's not in there?


Maybe because it stops searching when it finds it?


The time seems to be very dependant of the position of the first match:


  [EMAIL PROTECTED] ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'not there'*100" "'not there' in s"
  100 loops, best of 3: 0.222 usec per loop


  [EMAIL PROTECTED] ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*20 + 'not there'*100" "'not there' in s"

  10 loops, best of 3: 5.54 usec per loop


  [EMAIL PROTECTED] ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*40 + 'not there'*100" "'not there' in s"

  10 loops, best of 3: 10.8 usec per loop



.    Facundo


Bitácora De Vuelo: http://www.taniquetil.com.ar/plog
PyAr - Python Argentina: http://pyar.decode.com.ar/





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Mike Brown
Fredrik Lundh wrote:
> any special reason why "in" is faster if the substring is found, but
> a lot slower if it's not in there?

Just guessing here, but in general I would think that it would stop searching 
as soon as it found it, whereas until then, it keeps looking, which takes more 
time. But I would also hope that it would be smart enough to know that it 
doesn't need to look past the 2nd character in 'not the xyz' when it is 
searching for 'not there' (due to the lengths of the sequences).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread A.M. Kuchling
On Wed, Feb 16, 2005 at 01:34:16PM -0700, Mike Brown wrote:
> time. But I would also hope that it would be smart enough to know that it 
> doesn't need to look past the 2nd character in 'not the xyz' when it is 
> searching for 'not there' (due to the lengths of the sequences).

Assuming stringobject.c:string_contains is the right function, the
code looks like this:

size = PyString_GET_SIZE(el);
rhs = PyString_AS_STRING(el);
lhs = PyString_AS_STRING(a);

/* optimize for a single character */
if (size == 1)
return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;

end = lhs + (PyString_GET_SIZE(a) - size);
while (lhs <= end) {
if (memcmp(lhs++, rhs, size) == 0)
return 1;
}

So it's doing a zillion memcmp()s.  I don't think there's a more
efficient way to do this with ANSI C; memmem() is a GNU extension that
searches for blocks of memory.  Perhaps saving some memcmps by writing

 if ((*lhs  == *rhs) && memcmp(lhs++, rhs, size) == 0)

would help.

--amk

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Guido van Rossum
> Assuming stringobject.c:string_contains is the right function, the
> code looks like this:
> 
> size = PyString_GET_SIZE(el);
> rhs = PyString_AS_STRING(el);
> lhs = PyString_AS_STRING(a);
> 
> /* optimize for a single character */
> if (size == 1)
> return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;
> 
> end = lhs + (PyString_GET_SIZE(a) - size);
> while (lhs <= end) {
> if (memcmp(lhs++, rhs, size) == 0)
> return 1;
> }
> 
> So it's doing a zillion memcmp()s.  I don't think there's a more
> efficient way to do this with ANSI C; memmem() is a GNU extension that
> searches for blocks of memory.  Perhaps saving some memcmps by writing
> 
>  if ((*lhs  == *rhs) && memcmp(lhs++, rhs, size) == 0)
> 
> would help.

Which is exactly how s.find() wins this race. (I guess it loses when
it's found by having to do the "find" lookup.) Maybe string_contains
should just call string_find_internal()?

And then there's the question of how the re module gets to be faster
still; I suppose it doesn't bother with memcmp() at all.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Irmen de Jong
Mike Brown wrote:
Fredrik Lundh wrote:
any special reason why "in" is faster if the substring is found, but
a lot slower if it's not in there?

Just guessing here, but in general I would think that it would stop searching 
as soon as it found it, whereas until then, it keeps looking, which takes more 
time. But I would also hope that it would be smart enough to know that it 
doesn't need to look past the 2nd character in 'not the xyz' when it is 
searching for 'not there' (due to the lengths of the sequences).
There's the Boyer-Moore string search algorithm which is
allegedly much faster than a simplistic scanning approach,
and I also found this: http://portal.acm.org/citation.cfm?id=79184
So perhaps there's room for improvement :)
--Irmen
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
A.M. Kuchling wrote:

>> time. But I would also hope that it would be smart enough to know that it
>> doesn't need to look past the 2nd character in 'not the xyz' when it is
>> searching for 'not there' (due to the lengths of the sequences).
>
> Assuming stringobject.c:string_contains is the right function, the
> code looks like this:
>
> size = PyString_GET_SIZE(el);
> rhs = PyString_AS_STRING(el);
> lhs = PyString_AS_STRING(a);
>
> /* optimize for a single character */
> if (size == 1)
> return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;
>
> end = lhs + (PyString_GET_SIZE(a) - size);
> while (lhs <= end) {
> if (memcmp(lhs++, rhs, size) == 0)
> return 1;
> }
>
> So it's doing a zillion memcmp()s.  I don't think there's a more
> efficient way to do this with ANSI C; memmem() is a GNU extension that
> searches for blocks of memory.

oops.  so whoever implemented contains didn't even bother to look at the
find implementation... (which uses the same brute-force algorithm, but a better
implementation...)

> Perhaps saving some memcmps by writing
>
> if ((*lhs  == *rhs) && memcmp(lhs++, rhs, size) == 0)
>
> would help.

memcmp still compiles to REP CMPB on many x86 compilers, and the setup
overhead for memcmp sucks on modern x86 hardware; it's usually better to
write your own bytewise comparision...

(and the fact that we're still brute-force search algorithms in "find" is a bit
embarrassing -- note that RE outperforms "in" by a factor of five  guess
it's time to finish the split/replace parts of stringlib and produce a patch... 
;-)

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Mike Brown wrote:
>> any special reason why "in" is faster if the substring is found, but
>> a lot slower if it's not in there?
>
> Just guessing here, but in general I would think that it would stop searching
> as soon as it found it, whereas until then, it keeps looking, which takes more
> time.

the point was that string.find does the same thing, but is much faster in
the "no match" case.

> But I would also hope that it would be smart enough to know that it
> doesn't need to look past the 2nd character in 'not the xyz' when it is
> searching for 'not there' (due to the lengths of the sequences).

note that the target string was "not the xyz"*100, so the search algorithm
surely has to look past the second character ;-)

(btw, the benchmark was taken from jim hugunin's ironpython talk, and
seems to be carefully designed to kill performance also for more advanced
algorithms -- including boyer-moore)

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Guido van Rossum wrote:

> Which is exactly how s.find() wins this race. (I guess it loses when
> it's found by having to do the "find" lookup.) Maybe string_contains
> should just call string_find_internal()?

I somehow suspected that "in" did some extra work in case the "find"
failed; guess I should have looked at the code instead...  I didn't really
expect anyone to use a bad implementation of a brute-force algorithm
(O(nm)) when the library already contained a reasonably good version
of the same algorithm.

> And then there's the question of how the re module gets to be faster
> still; I suppose it doesn't bother with memcmp() at all.

the benchmark cheats (a bit) -- it builds a state machine (KMP-style) in
"compile", and uses that to search in O(n) time.

that approach won't fly for "in" and find, of course, but it's definitely 
possible
to make them run a lot faster than RE (i.e. O(n/m) for most cases)...

but refactoring the contains code to use find_internal sounds like a good
first step.  any takers?

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 2.4 func.__name__ breakage

2005-02-16 Thread Tim Peters
Rev 2.66 of funcobject.c made func.__name__ writable for the first
time.  That's great, but the patch also introduced what I'm pretty
sure was an unintended incompatibility:  after 2.66, func.__name__ was
no longer *readable* in restricted execution mode.  I can't think of a
good reason to restrict reading func.__name__, and it looks like this
part of the change was an accident.  So, unless someone objects soon,
I intend to restore that func.__name__ is readable regardless of
execution mode (but will continue to be unwritable in restricted
execution mode).

Objections?

Tres Seaver filed a bug report (some Zope tests fail under 2.4 because of this):

http://www.python.org/sf/1124295
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Raymond Hettinger
> but refactoring the contains code to use find_internal sounds like a
good
> first step.  any takers?
> 
>  
 
I'm up for it.
 

Raymond Hettinger


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh

> memcmp still compiles to REP CMPB on many x86 compilers, and the setup
> overhead for memcmp sucks on modern x86 hardware

make that "compiles to REPE CMPSB" and "the setup overhead for
REPE CMPSB"

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Scott David Daniels
Irmen de Jong wrote:
There's the Boyer-Moore string search algorithm which is
allegedly much faster than a simplistic scanning approach,
and I also found this: http://portal.acm.org/citation.cfm?id=79184
So perhaps there's room for improvement :)
The problem is setup vs. run.  If the question is 'ab in 'rabcd',
Boyer-Moore and other fancy searches will be swamped with prep time.
In Fred's comparison with re, he does the re.compile(...) outside of
the timing loop.  You need to decide what the common case is.
The longer the thing you are searching in, the more one-time-only
overhead you can afford to reduce the per-search-character cost.
--Scott David Daniels
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Guido van Rossum
> The longer the thing you are searching in, the more one-time-only
> overhead you can afford to reduce the per-search-character cost.

Only if you don't find it close to the start.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Scott David Daniels
Fredrik Lundh wrote:
(btw, the benchmark was taken from jim hugunin's ironpython talk, and
seems to be carefully designed to kill performance also for more advanced
algorithms -- including boyer-moore)
Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do
about 300 probes once the table is set (the underscores below):
not the xyznot the xyznot the xyz...
not ther_
 not the__
   not ther_
not the__
  not ther_
...
-- Scott David Daniels
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Scott David Daniels wrote:

> Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do
> about 300 probes once the table is set (the underscores below):
>
> not the xyznot the xyznot the xyz...
> not ther_
>  not the__
>not ther_
> not the__
>   not ther_
> ...

yup; it gets into a 9/2/9/2 rut. tweak the pattern a little, and you get better
results for BM.

("kill" is of course an understatement, but BM usually works better.  but it 
still
needs a sizeof(alphabet) table, so you can pretty much forget about it if you
want to support unicode...)

 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%

2005-02-16 Thread =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Gfeller Martin wrote:
Nevertheless, I tried to convert the heap used by Python 
to a Windows Low Fragmentation Heap (available on XP 
and 2003 Server). This improved the overall run time 
of a typical CPU-intensive report by about 15% 
(overall run time is in the 5 minutes range), with the
same memory consumption.
I must admit that I'm surprised. I would have expected
that most allocations in Python go through obmalloc, so
the heap would only see "large" allocations.
It would be interesting to find out, in your application,
why it is still an improvement to use the low-fragmentation
heaps.
Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] string find(substring) vs. substring in string

2005-02-16 Thread Dennis Allison
Boyer-Moore and variants need a bit of preprocessing on the pattern which
makes them great for long patterns but more costly for short ones.

On Wed, 16 Feb 2005, Irmen de Jong wrote:

> Mike Brown wrote:
> > Fredrik Lundh wrote:
> > 
> >>any special reason why "in" is faster if the substring is found, but
> >>a lot slower if it's not in there?
> > 
> > 
> > Just guessing here, but in general I would think that it would stop 
> > searching 
> > as soon as it found it, whereas until then, it keeps looking, which takes 
> > more 
> > time. But I would also hope that it would be smart enough to know that it 
> > doesn't need to look past the 2nd character in 'not the xyz' when it is 
> > searching for 'not there' (due to the lengths of the sequences).
> 
> There's the Boyer-Moore string search algorithm which is
> allegedly much faster than a simplistic scanning approach,
> and I also found this: http://portal.acm.org/citation.cfm?id=79184
> So perhaps there's room for improvement :)
> 
> --Irmen
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu
> 

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%

2005-02-16 Thread Evan Jones
On Feb 16, 2005, at 18:42, Martin v. Löwis wrote:
I must admit that I'm surprised. I would have expected
that most allocations in Python go through obmalloc, so
the heap would only see "large" allocations.
It would be interesting to find out, in your application,
why it is still an improvement to use the low-fragmentation
heaps.
Hmm... This is an excellent point. A grep through the Python source 
code shows that the following files call the native system malloc (I've 
excluded a few obviously platform specific files). A quick visual 
inspection shows that most of these are using it to allocate some sort 
of array or string, so it likely *should* go through the system malloc. 
Gfeller, any idea if you are using any of the modules on this list? If 
so, it would be pretty easy to try converting them to call the obmalloc 
functions instead, and see how that affects the performance.

Evan Jones
Demo/pysvr/pysvr.c
Modules/_bsddb.c
Modules/_curses_panel.c
Modules/_cursesmodule.c
Modules/_hotshot.c
Modules/_sre.c
Modules/audioop.c
Modules/bsddbmodule.c
Modules/cPickle.c
Modules/cStringIO.c
Modules/getaddrinfo.c
Modules/main.c
Modules/pyexpat.c
Modules/readline.c
Modules/regexpr.c
Modules/rgbimgmodule.c
Modules/svmodule.c
Modules/timemodule.c
Modules/zlibmodule.c
PC/getpathp.c
Python/strdup.c
Python/thread.c
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] builtin_id() returns negative numbers

2005-02-16 Thread Greg Ewing
Richard Brodie wrote:
Otherwise, unless I misunderstand integer unification, one would
just have to strike the distinction between, say, %d and %u.
Couldn't that be done anyway? The distinction really only
makes sense in C, where there's no way of knowing whether
the value is signed or unsigned otherwise. In Python the
value itself knows whether it's signed or not.
--
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] builtin_id() returns negative numbers

2005-02-16 Thread Guido van Rossum
> > Otherwise, unless I misunderstand integer unification, one would
> > just have to strike the distinction between, say, %d and %u.
> 
> Couldn't that be done anyway? The distinction really only
> makes sense in C, where there's no way of knowing whether
> the value is signed or unsigned otherwise. In Python the
> value itself knows whether it's signed or not.

The time machine is at your service: in Python 2.4 there's no
difference. That's integer unification for you!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c

2005-02-16 Thread Gregory P. Smith
fyi - i've updated the python sha1/md5 openssl patch.  it now replaces
the entire sha and md5 modules with a generic hashes module that gives
access to all of the hash algorithms supported by OpenSSL (including
appropriate legacy interface wrappers and falling back to the old code
when compiled without openssl).

 
https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470

I don't quite like the module name 'hashes' that i chose for the
generic interface (too close to the builtin hash() function).  Other
suggestions on a module name?  'digest' comes to mind.

-greg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com