[c-api]Transmutation of an extension object into a read-only buffer adding an integer in-place.

2012-08-10 Thread Giacomo Alzetta
I'm trying to implement a c-extension which defines a new class(ModPolynomial 
on the python side, ModPoly on the C-side).
At the moment I'm writing the in-place addition, but I get a *really* strange 
behaviour.

Here's the code for the in-place addition:

#define ModPoly_Check(v) (PyObject_TypeCheck(v, &ModPolyType))
[...]
static PyObject *
ModPoly_InPlaceAdd(PyObject *self, PyObject *other)
{

if (!ModPoly_Check(self)) {
// This should never occur for in-place addition, am I correct?
if (!ModPoly_Check(other)) {
PyErr_SetString(PyExc_TypeError, "Neither argument is a 
ModPolynomial.");
return NULL;
}
return ModPoly_InPlaceAdd(other, self);
} else {
if (!PyInt_Check(other) && !PyLong_Check(other)) {
Py_INCREF(Py_NotImplemented);
return Py_NotImplemented;
}
}

ModPoly *Tself = (ModPoly *)self;
PyObject *tmp, *tmp2;
tmp = PyNumber_Add(Tself->ob_item[0], other);
tmp2 = PyNumber_Remainder(tmp, Tself->n_modulus);

Py_DECREF(tmp);
tmp = Tself->ob_item[0];
Tself->ob_item[0] = tmp2;
Py_DECREF(tmp);

printf("%d\n", (int)ModPoly_Check(self));
return self;

}

And here's an example usage:

>>> from algebra import polynomials
>>> pol = polynomials.ModPolynomial(3,17)
>>> pol += 5
1
>>> pol

>>> 

Now, how come my ModPolynomial suddenly becomes a read-only buffer, even though 
that last printf tells us that the object returned is of the correct type?
If I raise an exception instead of returning self, the ModPolynomial gets 
incremented correctly. If I use the Py_RETURN_NONE macro, the ModPolynomial is 
correctly replaced by None.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [c-api]Transmutation of an extension object into a read-only buffer adding an integer in-place.

2012-08-10 Thread Giacomo Alzetta
Il giorno venerdì 10 agosto 2012 11:22:13 UTC+2, Hans Mulder ha scritto:
> On 10/08/12 10:20:00, Giacomo Alzetta wrote:
> 
> > I'm trying to implement a c-extension which defines a new 
> > class(ModPolynomial on the python side, ModPoly on the C-side).
> 
> > At the moment I'm writing the in-place addition, but I get a *really* 
> > strange behaviour.
> 
> > 
> 
> > Here's the code for the in-place addition:
> 
> > 
> 
> > #define ModPoly_Check(v) (PyObject_TypeCheck(v, &ModPolyType))
> 
> > [...]
> 
> > static PyObject *
> 
> > ModPoly_InPlaceAdd(PyObject *self, PyObject *other)
> 
> > {
> 
> > 
> 
> > if (!ModPoly_Check(self)) {
> 
> > // This should never occur for in-place addition, am I correct?
> 
> > if (!ModPoly_Check(other)) {
> 
> > PyErr_SetString(PyExc_TypeError, "Neither argument is a 
> > ModPolynomial.");
> 
> > return NULL;
> 
> > }
> 
> > return ModPoly_InPlaceAdd(other, self);
> 
> > } else {
> 
> > if (!PyInt_Check(other) && !PyLong_Check(other)) {
> 
> > Py_INCREF(Py_NotImplemented);
> 
> > return Py_NotImplemented;
> 
> > }
> 
> > }
> 
> > 
> 
> > ModPoly *Tself = (ModPoly *)self;
> 
> > PyObject *tmp, *tmp2;
> 
> > tmp = PyNumber_Add(Tself->ob_item[0], other);
> 
> > tmp2 = PyNumber_Remainder(tmp, Tself->n_modulus);
> 
> > 
> 
> > Py_DECREF(tmp);
> 
> > tmp = Tself->ob_item[0];
> 
> > Tself->ob_item[0] = tmp2;
> 
> > Py_DECREF(tmp);
> 
> > 
> 
> > printf("%d\n", (int)ModPoly_Check(self));
> 
> > return self;
> 
> > 
> 
> > }
> 
> 
> 
> I have no experience writing extensions in C, but as I see it,
> 
> you're returning a new reference to self, so you'd need:
> 
> 
> 
> Py_INCREF(self);
> 
> 
> 
> If you don't, then a Py_DECREF inside the assignment operator
> 
> causes your polynomial to be garbage collected.  Its heap slot
> 
> is later used for the unrelated buffer object you're seeing.
> 
> 
> 
> Hope this helps,
> 
> 
> 
> -- HansM

Yes, you're right. I didn't thought the combined operator would do a Py_DECREF 
if the iadd operation was implemented, but it obviosuly makes sense.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [c-api]Transmutation of an extension object into a read-only buffer adding an integer in-place.

2012-08-10 Thread Giacomo Alzetta
Il giorno venerdì 10 agosto 2012 14:21:50 UTC+2, Hans Mulder ha scritto:
> On 10/08/12 11:25:36, Giacomo Alzetta wrote:
> 
> > Il giorno venerdì 10 agosto 2012 11:22:13 UTC+2, Hans Mulder ha scritto:
> 
> [...]
> 
> > Yes, you're right. I didn't thought the combined operator would do a 
> > Py_DECREF
> 
> > if the iadd operation was implemented, but it obviosuly makes sense.
> 
> 
> 
> The += operator cannot know if the iadd returns self or a newly created
> 
> object.  Mutable types usually do the former; non-mutable types must do
> 
> the latter.
> 
> 
> 
> Come to think of it: why are your polynomials mutable?
> 
> 
> 
> As a mathematician, I would think of polynomials as elements of
> 
> some kind of ring, and I'd expect them to be non-mutable.
> 
> 
> 
> -- HansM

Usually non-mutable types simply do not implement the iadd operation, and the 
interpreter tries the simple add after(see intobject.c, longobject.c etc.).

I've decided to make my polynomials mutable for efficiency.
I want to implement the AKS primality tests, and if I want to use it with big 
numbers the number of coefficients of a polynomial can easily go up to 
1k-10k-100k and using non-mutable polynomials would mean to allocate and free 
that much memory for almost every operation.
Like this I have to allocate/free less frequently.

[even though I must admit that this is a premature optimization :s, since I've 
not profile anything, but untill I do not implement them I wont be able to see 
how much time I gain.]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [c-api]Transmutation of an extension object into a read-only buffer adding an integer in-place.

2012-08-10 Thread Giacomo Alzetta
Il giorno venerdì 10 agosto 2012 20:50:08 UTC+2, Stefan Behnel ha scritto:
> Giacomo Alzetta, 10.08.2012 10:20:
> 
> > I'm trying to implement a c-extension which defines a new 
> > class(ModPolynomial on the python side, ModPoly on the C-side).
> 
> > At the moment I'm writing the in-place addition, but I get a *really* 
> > strange behaviour.
> 
> 
> 
> You should take a look at Cython. It makes these things way easier and
> 
> safer than with manually written C code. It will save you a lot of code,
> 
> debugging and general hassle.
> 
> 
> 
> Stefan

I already know Cython, but I hope to learn a bit how python works from the 
C-side writing this extension.

Also this work is going to be included in a research work I'm doing, so I'd 
prefer to stick to Python and C, without having to put cython sources or 
cython-generated c modules(which I know are almost completely unreadable from a 
human point of view. Or at least the ones I saw).

Anyway thank you for the suggestion.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [c-api]Transmutation of an extension object into a read-only buffer adding an integer in-place.

2012-08-11 Thread Giacomo Alzetta
Il giorno sabato 11 agosto 2012 08:40:18 UTC+2, Stefan Behnel ha scritto:
> Giacomo Alzetta, 11.08.2012 08:21:
> 
> > I'd prefer to stick to Python and C, without having to put cython
> 
> > sources or cython-generated c modules (which I know are almost
> 
> > completely unreadable from a human point of view. Or at least the ones I
> 
> > saw).
> 
> 
> 
> And the cool thing is: you don't have to read them. :)
> 
> 
> 
> Stefan

Yes, but since all this code will end-up in the hands of some examiner, he'll 
have to read them. :)
-- 
http://mail.python.org/mailman/listinfo/python-list


in-place exponentiation incongruities

2012-08-11 Thread Giacomo Alzetta
I've noticed some incongruities regarding in-place exponentiation.

On the C side nb_inplace_power is a ternary function, like nb_power (see here: 
http://docs.python.org/c-api/typeobj.html?highlight=numbermethods#PyNumberMethods).

Obviously you can't pass the third argument using the usual in-place syntax 
"**=".
Nevertheless I'd expect to be able to provide the third argument using 
operator.ipow. But the operator module accept only the two parameter variant.

The Number Protocol specify that the ipow operation ""is the equivalent of the 
Python statement o1 **= o2 when o3 is Py_None, or an in-place variant of 
pow(o1, o2, o3) otherwise.""

Since "operator" claims to be contain a "function port" of the operators, I'd 
expect it to implement ipow with three arguments.
I don't see any problem in adding the third argument to it(I mean, sure right 
now any code that calls ipow(a,b,c) [if exists] is broken, because it will just 
raise a TypeError, thus adding the argument will not break any code, and would 
provide more functionality.

Also, I don't think there are many objects in the build-ins or standardlib 
which implement an in-place exponentiation, so this means there wont be much 
code to change.

So my question is: why are there this incongruities?
Is there any chance to see this fixed? (in the operator module, or changing the 
documentation)

By the way: I'm asking this because I'm implementing a C-extension and I'd like 
to implement both pow and ipow. And since it's about polynomials on 
(Z/nZ)[x]/x^r-1, using the third argument always makes sense.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: in-place exponentiation incongruities

2012-08-12 Thread Giacomo Alzetta
Il giorno domenica 12 agosto 2012 06:28:10 UTC+2, Steven D'Aprano ha scritto:
> On Sat, 11 Aug 2012 09:54:56 -0700, Giacomo Alzetta wrote:
> 
> 
> 
> > I've noticed some incongruities regarding in-place exponentiation.
> 
> > 
> 
> > On the C side nb_inplace_power is a ternary function, like nb_power (see
> 
> > here:
> 
> > http://docs.python.org/c-api/typeobj.html?
> 
> highlight=numbermethods#PyNumberMethods).
> 
> > 
> 
> > Obviously you can't pass the third argument using the usual in-place
> 
> > syntax "**=". Nevertheless I'd expect to be able to provide the third
> 
> > argument using operator.ipow. But the operator module accept only the
> 
> > two parameter variant.
> 
> 
> 
> Why? The operator module implements the ** operator, not the pow() 
> 
> function. If you want the pow() function, you can just use it directly, 
> 
> no need to use operator.pow or operator.ipow.
> 
> 
> 
> Since ** is a binary operator, it can only accept two arguments.
> 
> 
> 
> 
> 
> > The Number Protocol specify that the ipow operation ""is the equivalent
> 
> > of the Python statement o1 **= o2 when o3 is Py_None, or an in-place
> 
> > variant of pow(o1, o2, o3) otherwise.""
> 
> 
> 
> Where is that from?
> 
> 
> 
> 
> 
> -- 
> 
> Steven

>From The Number Protocol(http://docs.python.org/c-api/number.html).
The full text is:

PyObject* PyNumber_InPlacePower(PyObject *o1, PyObject *o2, PyObject *o3)
Return value: New reference.

**See the built-in function pow().** Returns NULL on failure. The operation 
is done in-place when o1 supports it. This is the equivalent of the Python 
statement o1 **= o2 when o3 is Py_None, or an in-place variant of pow(o1, o2, 
o3) otherwise. If o3 is to be ignored, pass Py_None in its place (passing NULL 
for o3 would cause an illegal memory access).

The first thing that this text does is referring to the **function** pow, which 
takes three arguments. And since the documentation of the operator module 
states that "The operator module exports a set of efficient functions 
corresponding to the intrinsic operators of Python.", I'd expect the ipow to 
have three arguments, the third being optional.


With normal exponentiation you have ** referring to the 2-argument variant, and 
"pow" providing the ability to use the third argument.
At the moment in-place exponentiation you have "**=" referring to the 
2-argument variant(and this is consistent), while operator.ipow also referring 
to it. So providing an ipow with the third argument would just increase 
consistency in the language, and provide a feature that at the moment is not 
present. (well if the designers of python care really much about consistency 
they'd probably add an "ipow" built-in function, so that you don't have to 
import it from "operator").

I understand that it's not a feature often used, but I can't see why not 
allowing it.
At the moment you can do that from the C side, because you can call 
PyNumber_InPlacePower directly, but from the python side you have no way to do 
that, except for writing a C-extension that wraps the PyNumber_InPlacePower 
function.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: in-place exponentiation incongruities

2012-08-12 Thread Giacomo Alzetta
Il giorno domenica 12 agosto 2012 13:03:08 UTC+2, Steven D'Aprano ha scritto:
> On Sun, 12 Aug 2012 00:14:27 -0700, Giacomo Alzetta wrote:
> 
> 
> 
> > From The Number Protocol(http://docs.python.org/c-api/number.html). The
> 
> > full text is:
> 
> > 
> 
> > PyObject* PyNumber_InPlacePower(PyObject *o1, PyObject *o2, PyObject
> 
> > *o3)
> 
> > Return value: New reference.
> 
> > 
> 
> > **See the built-in function pow().** Returns NULL on failure. The
> 
> > operation is done in-place when o1 supports it. This is the
> 
> > equivalent of the Python statement o1 **= o2 when o3 is Py_None, or
> 
> > an in-place variant of pow(o1, o2, o3) otherwise. If o3 is to be
> 
> > ignored, pass Py_None in its place (passing NULL for o3 would cause
> 
> > an illegal memory access).
> 
> > 
> 
> > The first thing that this text does is referring to the **function**
> 
> > pow, which takes three arguments. And since the documentation of the
> 
> > operator module states that "The operator module exports a set of
> 
> > efficient functions corresponding to the intrinsic operators of
> 
> > Python.", I'd expect the ipow to have three arguments, the third being
> 
> > optional.
> 
> 
> 
> Why? There is no three-argument operator. There is a three-argument 
> 
> function, pow, but you don't need the operator module for that, it is 
> 
> built-in. There is no in-place three-argument operator, and no in-place 
> 
> three-argument function in the operator module.
> 
> 
> 
> Arguing from "consistency" is not going to get you very far, since it is 
> 
> already consistent:
> 
> 
> 
> In-place binary operator: **=
> 
> In-place binary function: operator.ipow
> 
> 
> 
> In-place three-argument operator: none
> 
> In-place three-argument function: none
> 
> 
> 
> If you decide to make a feature-request, you need to argue from 
> 
> usefulness, not consistency.
> 
> 
> 
> http://bugs.python.org
> 
> 
> 
> Remember that the Python 2.x branch is now in feature-freeze, so new 
> 
> features only apply to Python 3.x.
> 
> 
> 
> 
> 
> > With normal exponentiation you have ** referring to the 2-argument
> 
> > variant, and "pow" providing the ability to use the third argument. 
> 
> 
> 
> Correct.
> 
> 
> 
> 
> 
> > At the moment in-place exponentiation you have "**=" referring to the
> 
> > 2-argument variant(and this is consistent), while operator.ipow also
> 
> > referring to it. 
> 
> 
> 
> Correct. Both **= and ipow match the ** operator, which only takes two 
> 
> arguments.
> 
> 
> 
> 
> 
> > So providing an ipow with the third argument would just
> 
> > increase consistency in the language, 
> 
> 
> 
> Consistency with something other than **=  would be inconsistency.
> 
> 
> 
> 
> 
> 
> 
> > and provide a feature that at the
> 
> > moment is not present. (well if the designers of python care really much
> 
> > about consistency they'd probably add an "ipow" built-in function, so
> 
> > that you don't have to import it from "operator").
> 
> 
> 
> Not everything needs to be a built-in function.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

Probably I've mixed things up.

What I mean is: when you implement a new type as a C extension you have to 
provide special methods through the NumberMethods struct. In this struct both 
the power and in-place power operations have three arguments.
Now, suppose I implement the three argument variant of the in-place power in a 
class. No user would be able to call my C function with a non-None third 
argument, while he would be able to call the normal version with the third 
argument.

This is what I find inconsistent. either provide a way to call also the 
in-place exponentiation with three arguments, or define it as a binary function.
Having it as a ternary function, but only from the C-side is quite strange.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: in-place exponentiation incongruities

2012-08-14 Thread Giacomo Alzetta
Il giorno domenica 12 agosto 2012 23:53:46 UTC+2, Terry Reedy ha scritto:
> 
> Are you actually planning to do this, or is this purely theoretical?
> 

Yes, I do plan to implement ipow.

> 
> Not true. Whether the function is coded in Python or C
> 
> cls.__ipow__(base, exp, mod) # or
> 
> base.__ipow__(exp, mod)
> 
> 
> 
> > while he would be able to
> 
> > call the normal version with the third argument.

Yes, that's true. But I find that calling a special-method in that way is 
*really* ugly. I'd think that the pow built-in function was probably inserted 
to avoid writing base.__pow__(exp, mod).


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why doesn't Python remember the initial directory?

2012-08-19 Thread Giacomo Alzetta
Il giorno domenica 19 agosto 2012 22:42:16 UTC+2, kj ha scritto:
> As far as I've been able to determine, Python does not remember
> 
> (immutably, that is) the working directory at the program's start-up,
> 
> or, if it does, it does not officially expose this information.
> 
> 
> 
> Does anyone know why this is?  Is there a PEP stating the rationale
> 
> for it?
> 
> 
> 
> Thanks!

You can obtain the working directory with os.getcwd().

giacomo@jack-laptop:~$ echo 'import os; print os.getcwd()' > testing-dir.py
giacomo@jack-laptop:~$ python testing-dir.py 
/home/giacomo
giacomo@jack-laptop:~$ cd Documenti
giacomo@jack-laptop:~/Documenti$ python ../testing-dir.py 
/home/giacomo/Documenti
giacomo@jack-laptop:~/Documenti$ 

Obviously using os.chdir() will change the working directory, and the 
os.getcwd() will not be the start-up working directory, but if you need the 
start-up working directory you can get it at start-up and save it in some 
constant.
-- 
http://mail.python.org/mailman/listinfo/python-list


Add a "key" parameter to bisect* functions?

2012-09-12 Thread Giacomo Alzetta
I've just noticed that the bisect module lacks of the key parameter.

The documentation points to a recipe that could be used to handle a sorted 
collection, but I think it's an overkill if I want to bisect my sequence only 
once or twice with a key. Having something like `bisect(sequence, key=my_key)` 
would be much easier and would conform to the other operations such as 
max/min/sorted.

Is there some reason behind this lack?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Add a "key" parameter to bisect* functions?

2012-09-12 Thread Giacomo Alzetta
Il giorno mercoledì 12 settembre 2012 17:54:31 UTC+2, Giacomo Alzetta ha 
scritto:
> I've just noticed that the bisect module lacks of the key parameter.
> 
> 
> 
> The documentation points to a recipe that could be used to handle a sorted 
> collection, but I think it's an overkill if I want to bisect my sequence only 
> once or twice with a key. Having something like `bisect(sequence, 
> key=my_key)` would be much easier and would conform to the other operations 
> such as max/min/sorted.
> 
> 
> 
> Is there some reason behind this lack?

Uhm, I've found this piece of documentation: 
http://docs.python.org/library/bisect.html#other-examples

Now, what it's stated there does not make sense for me:

1) Adding a key/reversed parameter wont decrease the speed of calls to bisect 
that does not use those parameters. At least a good implementation should 
guarantee that

2) Yes, providing a key means that either bisect should do some caching in 
order to speed multiple look-ups, or the keys have to be recomputed every time.
But, I think that using this as a reason to not provide this parameter is wrong.
It's up to the user to decide what has to be done to make code fast.
If he has to do multiple lookups then he should precompute them(but this is 
true also for sort etc.), but if he just has to use this feature once than 
computing "on-the-fly" is simply perfect.

3) Also, the fact that you can bisect only in asceding order sounds strange to 
me. It's not hard to bisect in both directions...

Probably it would be correct to document possible pitfalls of using the 
eventually added "key" and "reversed" parameters(such as multiple key 
evaluation etc.), but I can't see other cons to this.

Also, this change would be 100% backward compatible.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Add a "key" parameter to bisect* functions?

2012-09-12 Thread Giacomo Alzetta
Il giorno mercoledì 12 settembre 2012 18:05:10 UTC+2, Miki Tebeka ha scritto:
> > I've just noticed that the bisect module lacks of the key parameter.
> 
> > ...
> 
> > Is there some reason behind this lack?
> 
> See full discussion at http://bugs.python.org/issue4356. Guido said it's 
> going in, however there's no time frame for it.

Oh, nice. So eventually it'll be added.

Correct me if I'm wrong, but the reason for this absence is that people could 
think that doing repeated bisects using keys would be fast, even though keys 
has to be recomputed?

I think that could simply be fixed with a "Note:\Warning:" on the documentation.
So that you can create keys with caching or use precomputed keys when you have 
to do a lot of bisects.
-- 
http://mail.python.org/mailman/listinfo/python-list


Missing modules compiling python3.3

2012-11-04 Thread Giacomo Alzetta
I'm trying to compile python3.3 on my (K)ubuntu 12.04, but some modules are 
missing.

In particular when doing make test I get:

Python build finished, but the necessary bits to build these modules were not 
found:
_bz2   _curses_curses_panel   
_dbm   _gdbm  _lzma   
_sqlite3   _tkinter   readline
To find the necessary bits, look in setup.py in detect_modules() for the 
module's name.

And also the "test_urlwithfrag" test fails, but when trying to do(as suggested 
in the README) "./python -m test -v test_urlwithfrag" I get an 
error[ImportError: No module named 'test.test_urlwithfrag'
]  and when doing "./python -m test -v test_urllib2net" it skips the test 
saying it's normal on linux(then why make test runs it and it fails???)

What am I missing? Should I install those modules manually? Is this expected?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Missing modules compiling python3.3

2012-11-04 Thread Giacomo Alzetta
Il giorno domenica 4 novembre 2012 15:56:03 UTC+1, mm0fmf ha scritto:
> Giacomo Alzetta wrote:
> 
> > I'm trying to compile python3.3 on my (K)ubuntu 12.04, but some modules are 
> > missing.
> 
> > 
> 
> > In particular when doing make test I get:
> 
> > 
> 
> > Python build finished, but the necessary bits to build these modules were 
> > not found:
> 
> > _bz2   _curses_curses_panel   
> 
> 
> 
> You haven't installed the development headers for those modules giving 
> 
> errors.
> 
> 
> 
> So for curses you'll need to install libncurses5-dev, lzma-dev etc. 
> 
> Sorry, I can't remember the package names as it's a while since I did this.
> 
> 
> 
> Andy

That's right! Sorry, but I thought I installed those some months ago for an 
other python installation, but probably I've also removed them :s

The "test_urlwithfrag" is still failing though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Inconsistent behaviour os str.find/str.index when providing optional parameters

2012-11-21 Thread Giacomo Alzetta
I just came across this:

>>> 'spam'.find('', 5)
-1


Now, reading find's documentation:

>>> print(str.find.__doc__)
S.find(sub [,start [,end]]) -> int

Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end].  Optional
arguments start and end are interpreted as in slice notation.

Return -1 on failure.

Now, the empty string is a substring of every string so how can find fail?
find, from the doc, should be generally be equivalent to 
S[start:end].find(substring) + start, except if the substring is not found but 
since the empty string is a substring of the empty string it should never fail.

Looking at the source code for find(in stringlib/find.h):

Py_LOCAL_INLINE(Py_ssize_t)
stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
   const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
   Py_ssize_t offset)
{
Py_ssize_t pos;

if (str_len < 0)
return -1;

I believe it should be:

if (str_len < 0)
return (sub_len == 0 ? 0 : -1);

Is there any reason of having this unexpected behaviour or was this simply 
overlooked?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Inconsistent behaviour os str.find/str.index when providing optional parameters

2012-11-21 Thread Giacomo Alzetta
Il giorno mercoledì 21 novembre 2012 20:25:10 UTC+1, Hans Mulder ha scritto:
> On 21/11/12 17:59:05, Alister wrote:
> 
> > On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
> 
> > 
> 
> >> I just came across this:
> 
> >>
> 
> >>>>> 'spam'.find('', 5)
> 
> >> -1
> 
> >>
> 
> >>
> 
> >> Now, reading find's documentation:
> 
> >>
> 
> >>>>> print(str.find.__doc__)
> 
> >> S.find(sub [,start [,end]]) -> int
> 
> >>
> 
> >> Return the lowest index in S where substring sub is found,
> 
> >> such that sub is contained within S[start:end].  Optional arguments
> 
> >> start and end are interpreted as in slice notation.
> 
> >>
> 
> >> Return -1 on failure.
> 
> >>
> 
> >> Now, the empty string is a substring of every string so how can find
> 
> >> fail?
> 
> >> find, from the doc, should be generally be equivalent to
> 
> >> S[start:end].find(substring) + start, except if the substring is not
> 
> >> found but since the empty string is a substring of the empty string it
> 
> >> should never fail.
> 
> >>
> 
> >> Looking at the source code for find(in stringlib/find.h):
> 
> >>
> 
> >> Py_LOCAL_INLINE(Py_ssize_t)
> 
> >> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
> 
> >>const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
> 
> >>Py_ssize_t offset)
> 
> >> {
> 
> >> Py_ssize_t pos;
> 
> >>
> 
> >> if (str_len < 0)
> 
> >> return -1;
> 
> >>
> 
> >> I believe it should be:
> 
> >>
> 
> >> if (str_len < 0)
> 
> >> return (sub_len == 0 ? 0 : -1);
> 
> >>
> 
> >> Is there any reason of having this unexpected behaviour or was this
> 
> >> simply overlooked?
> 
> > 
> 
> > why would you be searching for an empty string?
> 
> > what result would you expect to get from such a search?
> 
> 
> 
> 
> 
> In general, if
> 
> 
> 
> needle in haystack[ start: ]
> 
> 
> 
> return True, then you' expect
> 
> 
> 
> haystack.find(needle, start)
> 
> 
> 
> to return the smallest i >= start such that
> 
> 
> 
> haystack[i:i+len(needle)] == needle
> 
> 
> 
> also returns True.
> 
> 
> 
> >>> "" in "spam"[5:]
> 
> True
> 
> >>> "spam"[5:5+len("")] == ""
> 
> True
> 
> >>>
> 
> 
> 
> So, you'd expect that spam.find("", 5) would return 5.
> 
> 
> 
> The only other consistent position would be that "spam"[5:]
> 
> should raise an IndexError, because 5 is an invalid index.
> 
> 
> 
> For that matter, I wouldn;t mind if "spam".find(s, 5) were
> 
> to raise an IndexError.  But if slicing at position 5
> 
> proudces an empry string, then .find should be able to
> 
> find that empty string.
> 
> 
> 
> -- HansM

Exactly! Either string[i:] with i >= len(string) should raise an IndexError or 
find(string, i) should return i.

Anyway, thinking about this inconsistency can be solved in a simpler way and 
without adding comparson. You simply check the substring length first. If it is 
0 you already know that the string is a substring of the given string and you 
return the "offset", so the two ifs at the beginning of the function ought to 
be swapped.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Inconsistent behaviour os str.find/str.index when providing optional parameters

2012-11-21 Thread Giacomo Alzetta
Il giorno giovedì 22 novembre 2012 05:00:39 UTC+1, MRAB ha scritto:
> On 2012-11-22 03:41, Terry Reedy wrote:
> It can't return 5 because 5 isn't an index in 'spam'.
> 
> 
> 
> It can't return 4 because 4 is below the start index.

Uhm. Maybe you are right, because returning a greater value would cause an 
IndexError, but then, *why* is 4 returned???

>>> 'spam'.find('', 4)
4
>>> 'spam'[4]
Traceback (most recent call last):
  File "", line 1, in 
IndexError: string index out of range

4 is not a valid index either. I do not think the behaviour was completely 
intentional. If find should return indexes than 'spam'.find('', 4) must be -1, 
because 4 is not a valid index. If find should behave as if creating the slice 
and checking if the substring is in the slice than 'spam'.find('', i) should 
return i for every integer >= 4.

The docstring does not describe this edge case, so I think it could be improved.
If the first sentence(being an index in S) is kept, than it shouldn't say that 
start and end are treated as in slice notation, because that's actually not 
true. It should be added if start is greater or equal to len(S) then -1 is 
always returned(and in this case 'spam'.find('', 4) -> -1).
If find should not guarantee that the value returned is a valid index(when 
start isn't a valid index), then the first sentence should be rephrased to 
avoid giving this idea(and the comparisons in stringlib/find.h should be 
swapped to have the correct behaviour).
For example, maybe, it could be "Return the lowest index where substring sub is 
found (in S?), such that sub is contained in S[start:end]. ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Inconsistent behaviour os str.find/str.index when providing optional parameters

2012-11-22 Thread Giacomo Alzetta
Il giorno giovedì 22 novembre 2012 09:44:21 UTC+1, Steven D'Aprano ha scritto:
> On Wed, 21 Nov 2012 23:01:47 -0800, Giacomo Alzetta wrote:
> 
> 
> 
> > Il giorno giovedì 22 novembre 2012 05:00:39 UTC+1, MRAB ha scritto:
> 
> >> On 2012-11-22 03:41, Terry Reedy wrote: It can't return 5 because 5
> 
> >> isn't an index in 'spam'.
> 
> >> 
> 
> >> 
> 
> >> 
> 
> >> It can't return 4 because 4 is below the start index.
> 
> > 
> 
> > Uhm. Maybe you are right, because returning a greater value would cause
> 
> > an IndexError, but then, *why* is 4 returned???
> 
> > 
> 
> >>>> 'spam'.find('', 4)
> 
> > 4
> 
> >>>> 'spam'[4]
> 
> > Traceback (most recent call last):
> 
> >   File "", line 1, in 
> 
> > IndexError: string index out of range
> 
> > 
> 
> > 4 is not a valid index either. I do not think the behaviour was
> 
> > completely intentional.
> 
> 
> 
> 
> 
> The behaviour is certainly an edge case, but I think it is correct.
> 
> 
> 
> (Correct or not, it has been the same going all the way back to Python 
> 
> 1.5, before strings even had methods, so it almost certainly will not be 
> 
> changed. Changing the behaviour now will very likely break hundreds, 
> 
> maybe thousands, of Python programs that expect the current behaviour.)
> 


My point was not to change the behaviour but only to point out this possible 
inconsistency between what str.find/str.index do and what they claim to do in 
the documentation.

Anyway I'm not so sure that changing the behaviour would break many programs... 
I mean, the change would only impact code that was looking for an empty string 
over the string's bounds. I don't see often using the lo and hi parameters for 
find/index, and I think I never saw someone using them when they get out of 
bounds. If you add looking for the empty string I think that the number of 
programs breaking will be minimum. And even if they break, it would be really 
easy to fix them.

Anyway, I understand what you mean and maybe it's better to keep this (at least 
to me) odd behaviour for backwards compatibility.



> 
> By this logic, "spam".find("", 4) should return 4, because cut #4 is 
> 
> immediately to the left of the empty string. So Python's current 
> 
> behaviour is justified.
> 
> 
> 
> What about "spam".find("", 5)? Well, if you look at the string with the 
> 
> cuts marked as before:
> 
> 
> 
> 0-1-2-3-4
> 
> |s|p|a|m|
> 
> 
> 
> you will see that there is no cut #5. Since there is no cut #5, we can't 
> 
> sensibly say we found *anything* there, not even the empty string. If you 
> 
> have four boxes, you can't say that you found anything in the fifth box.
> 
> 
> 
> I realise that this behaviour clashes somewhat with the slicing rule that 
> 
> says that if the slice indexes go past the end of the string, you get an 
> 
> empty string. But that rule is more for convenience than a fundamental 
> 
> rule about strings.

Yeah, I understand what you say, but the logic you pointed out is never cited 
anywhere, while slices are cited in the docstring.


> 
> > The docstring does not describe this edge case, so I think it could be
> 
> > improved. If the first sentence(being an index in S) is kept, than it
> 
> > shouldn't say that start and end are treated as in slice notation,
> 
> > because that's actually not true. 
> 
> 
> 
> +1
> 
> 
> 
> I think that you are right that the documentation needs to be improved.

Definitely. The sentence "Optional
arguments start and end are interpreted as in slice notation." should be 
changed to something like:
"Optional arguments start and end are interpreted as in slice notation, unless 
start is (strictly?) greater than the length of S or end is smaller than start, 
in which cases the search always fails."

In this way the 'spam'.find('', 4) *is* documented because start=len(S) -> 
start and end are treated like in slice notation and 4 makes sense, while 
'spam'.find('', 5) -> -1 because 5 > len('spam') and thus the search fails
and also 'spam'.find('', 3, 2) -> -1 makes sense because 2 < 3(this edge case 
makes more sense, even though 'spam'[3:2] is still the empty string...).
-- 
http://mail.python.org/mailman/listinfo/python-list