[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2007-10-13 Thread jos

New submission from 
jos
:

When I compile Python-3.0a1 on Mac OS X with Japanese locale,
I've got LookupError like below.

==
running build_scripts
creating build/scripts-3.0
Traceback (most recent call last):
  File "./setup.py", line 1572, in 
main()
  File "./setup.py", line 1567, in main
'Lib/smtpd.py']
  File "/private/tmp/Python-3.0a1/Lib/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 943, in
run_commands
self.run_command(cmd)
  File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 963, in
run_command
cmd_obj.run()
  File "/private/tmp/Python-3.0a1/Lib/distutils/command/build.py", line
106, in run
self.run_command(cmd_name)
  File "/private/tmp/Python-3.0a1/Lib/distutils/cmd.py", line 317, in
run_command
self.distribution.run_command(command)
  File "/private/tmp/Python-3.0a1/Lib/distutils/dist.py", line 963, in
run_command
cmd_obj.run()
  File
"/private/tmp/Python-3.0a1/Lib/distutils/command/build_scripts.py", line
51, in run
self.copy_scripts()
  File
"/private/tmp/Python-3.0a1/Lib/distutils/command/build_scripts.py", line
82, in copy_scripts
first_line = f.readline()
  File "/private/tmp/Python-3.0a1/Lib/io.py", line 1259, in readline
decoder = self._decoder or self._get_decoder()
  File "/private/tmp/Python-3.0a1/Lib/io.py", line , in _get_decoder
make_decoder = codecs.getincrementaldecoder(self._encoding)
  File "/private/tmp/Python-3.0a1/Lib/codecs.py", line 951, in
getincrementaldecoder
decoder = lookup(encoding).incrementaldecoder
LookupError: unknown encoding: X-MAC-JAPANESE
make: *** [sharedmods] Error 1
==

This problem happens for lack of appropriate codec
so also occurs in apps using getdefaultencoding.

After patching Tools/unicode/Makefile and
running make generates build/mac_japanese.py, mac-japanese codec.

--
components: Build, Demos and Tools, Library (Lib), Macintosh, Unicode
files: x_mac_japanese.diff
messages: 56386
nosy: josm
severity: normal
status: open
title: LookupError: unknown encoding: X-MAC-JAPANESE

__
Tracker <[EMAIL PROTECTED]>

__

x_mac_japanese.diff
Description: Binary data
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1251] ssl module doesn't support non-blocking handshakes

2007-10-13 Thread Bill Janssen

Bill Janssen added the comment:

It's my mistake; I was looking at too many patches at the same time.
Thanks for the example.

Bill

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Changes by Guido van Rossum:


--
assignee:  -> gvanrossum
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

Couple of nits:

- You added a removal of hotshot from setup.py to the patch; but that's
been checked in in the mean time.

- Why add an 'errors' argument to the function when it's a fatal error
to use it?

- Using 0 to autodetect the length is scary.  Normally we have two APIs
for that, one ..._FromString and one ...FromStringAndSize.  If you
really don't want that, please use -1, which is at least an illegal value.

- Why is there code in codeobject.c::PyCode_New() that still accepts a
PyString for the filename?

- In that file (and possibly others, I didn't check) your code uses
spaces to indent while the surrounding code uses tabs.  Moreover, your
space indent seems to assume there are 4 spaces to a tab, but all our
code (Python and C) is formatted assuming tabs are 8 spaces.  (The
indent isn't always 8 spaces -- but ASCII TAB characters always are 8,
for us.)

- Why copy the default encoding before mangling it?  With a little extra
care you will only have to copy it once.  Also, consider not mangling at
all, but assuming the encoding comes in a canonical form -- several
other functions assume that, e.g. PyUnicode_Decode() and
PyUnicode_AsEncodedString().

- I haven't run the unit tests yet.  Will be doing that next...

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

I found a few problems in your patch. In PyCode_New() the type check
for the filename argument is incorrect:

--- Objects/codeobject.c(revision 58412)
+++ Objects/codeobject.c(working copy)
@@ -59,7 +59,7 @@
freevars == NULL || !PyTuple_Check(freevars) ||
cellvars == NULL || !PyTuple_Check(cellvars) ||
name == NULL || (!PyString_Check(name) && !PyUnicode_Check(name)) ||
-   filename == NULL || !PyString_Check(filename) ||
+   filename == NULL || (!PyString_Check(name) &&
!PyUnicode_Check(name)) ||
lnotab == NULL || !PyString_Check(lnotab) ||
!PyObject_CheckReadBuffer(code)) {
PyErr_BadInternalCall();

@@ -260,6 +267,8 @@
ourcellvars = PyTuple_New(0);
if (ourcellvars == NULL)
goto cleanup;
+filename = PyUnicode_DecodeFSDefault(PyString_AS_STRING(filename),
+ 0, NULL);


The following is unnecessary and will cause a reference leak:


@@ -260,6 +267,8 @@
ourcellvars = PyTuple_New(0);
if (ourcellvars == NULL)
goto cleanup;
+filename = PyUnicode_DecodeFSDefault(PyString_AS_STRING(filename),
+ 0, NULL);
 
co = (PyObject *)PyCode_New(argcount, kwonlyargcount,
nlocals, stacksize, flags,


I think the interface of PyUnicode_DecodeFSDefault() could be improved
a bit. The function doesn't use the last argument 'errors', so why is
there? I am not sure if it is useful to keep second argument,
'length', either. So, I believe the function prototype should be
changed to:

PyObject *PyUnicode_Decode_FSDefault(const char *s);

Another thing that I am not sure about is whether it is correct to
consider ISO-8859-15 the same thing as Latin-1.

Overall, the patch looks good to me and doesn't cause any test to
fail. I attached an updated patch with the above issues fixed.

Thank you, Christian, for the patch. :)

--
nosy: +alexandre.vassalotti

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Guido wrote: 
> Why copy the default encoding before mangling it?  With a little
> extra care you will only have to copy it once.  Also, consider not
> mangling at all, but assuming the encoding comes in a canonical form
> -- several other functions assume that, e.g. PyUnicode_Decode() and
> PyUnicode_AsEncodedString().

It is impossible guarantee that Py_FileSystemDefaultEncoding is
normalized, since its value can be set using nl_langinfo(CODESET)
during the bootstrapping process. PyUnicode_Decode() and other
decoding/encoding functions use the codec module, which is not available
during the early bootstrapping process, to normalize the encoding name.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1276] LookupError: unknown encoding: X-MAC-JAPANESE

2007-10-13 Thread Martin v. Löwis

Changes by Martin v. Löwis:


--
keywords: +patch

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Christian Heimes

Christian Heimes added the comment:

Guido van Rossum wrote:
> - You added a removal of hotshot from setup.py to the patch; but that's
> been checked in in the mean time.

Oh, the change shouldn't make it into the patch. I guess I forgot a svn
revert on setup.py

> - Why add an 'errors' argument to the function when it's a fatal error
> to use it?

I wanted the signature of the method be equal to the other methods
PyUnicode_Decode*. I copied the FatalError from
*_PyUnicode_AsDefaultEncodedString().

> - Using 0 to autodetect the length is scary.  Normally we have two APIs
> for that, one ..._FromString and one ...FromStringAndSize.  If you
> really don't want that, please use -1, which is at least an illegal value.

Oh right, -1 is *much* better for autodetect than 0. What do you prefer,
a second method or -1 as auto detect?

> - Why is there code in codeobject.c::PyCode_New() that still accepts a
> PyString for the filename?

Because it's my fault that I've overseen it. :/

> - In that file (and possibly others, I didn't check) your code uses
> spaces to indent while the surrounding code uses tabs.  Moreover, your
> space indent seems to assume there are 4 spaces to a tab, but all our
> code (Python and C) is formatted assuming tabs are 8 spaces.  (The
> indent isn't always 8 spaces -- but ASCII TAB characters always are 8,
> for us.)

Some C files like unicodeobject.c are using 4 spaces while other files
are using tabs for indention. My editor may got confused by the mix.
I've manually fixed it in the patch but I may have overseen a line or two.

> - Why copy the default encoding before mangling it?  With a little extra
> care you will only have to copy it once.  Also, consider not mangling at
> all, but assuming the encoding comes in a canonical form -- several
> other functions assume that, e.g. PyUnicode_Decode() and
> PyUnicode_AsEncodedString().

My C is a bit rusty and still need to learn news tricks. I'm trying to
see if I can remove the extra copy without causing a problem.
The other part of your question was already answered by Alexandre. The
aliases map is defined in Python code. It's not available so early in
the boot strapping process.
We'd have to redesign the assignment of co_filename and __file__
completely if we want to use the aliases and other codecs. For example
we could store a PyString at first and redo all names once the codecs
are set up.

Christian

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1260] PEP 3137: Remove the buffer API from PyUnicode

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

There was a problem with one of the call of PyArg_ParseTuple in the OS/2
version of listdir() in the posix module. I also clarified the error
message of the 't' format unit.

__
Tracker <[EMAIL PROTECTED]>

__Index: Python/getargs.c
===
--- Python/getargs.c	(revision 58422)
+++ Python/getargs.c	(working copy)
@@ -1250,7 +1250,7 @@
 arg, msgbuf, bufsize);
 		if (pb == NULL || pb->bf_getbuffer == NULL)
 			return converterr(
-"string or read-only character buffer",
+"bytes or read-only character buffer",
 arg, msgbuf, bufsize);
 
 		if ((*pb->bf_getbuffer)(arg, &view, PyBUF_CHARACTER) != 0) 
Index: Objects/unicodeobject.c
===
--- Objects/unicodeobject.c	(revision 58422)
+++ Objects/unicodeobject.c	(working copy)
@@ -8113,19 +8113,6 @@
 };
 
 
-static int
-unicode_buffer_getbuffer(PyUnicodeObject *self, Py_buffer *view, int flags)
-{
-
-if (flags & PyBUF_CHARACTER) {
-PyErr_SetString(PyExc_SystemError, "can't use str as char buffer");
-return -1;
-}
-return PyBuffer_FillInfo(view, (void *)self->str,
- PyUnicode_GET_DATA_SIZE(self), 1, flags);
-}
-
-
 /* Helpers for PyUnicode_Format() */
 
 static PyObject *
@@ -8819,11 +8806,6 @@
 return NULL;
 }
 
-static PyBufferProcs unicode_as_buffer = {
-(getbufferproc) unicode_buffer_getbuffer,
-NULL,
-};
-
 static PyObject *
 unicode_subtype_new(PyTypeObject *type, PyObject *args, PyObject *kwds);
 
@@ -8907,7 +8889,7 @@
 (reprfunc) unicode_str,	 	/* tp_str */
 PyObject_GenericGetAttr, 		/* tp_getattro */
 0,			 		/* tp_setattro */
-&unicode_as_buffer,			/* tp_as_buffer */
+0, 	/* tp_as_buffer */
 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | 
 Py_TPFLAGS_UNICODE_SUBCLASS,	/* tp_flags */
 unicode_doc,			/* tp_doc */
Index: Modules/_sre.c
===
--- Modules/_sre.c	(revision 58422)
+++ Modules/_sre.c	(working copy)
@@ -1674,6 +1674,15 @@
 void* ptr;
 Py_buffer view;
 
+/* Unicode objects do not support the buffer API. So, get the data
+   directly instead. */
+if (PyUnicode_Check(string)) {
+ptr = (void *)PyUnicode_AS_DATA(string);
+*p_length = PyUnicode_GET_SIZE(string);
+*p_charsize = sizeof(Py_UNICODE);
+return ptr;
+}
+
 /* get pointer to string buffer */
 view.len = -1;
 buffer = Py_Type(string)->tp_as_buffer;
Index: Modules/posixmodule.c
===
--- Modules/posixmodule.c	(revision 58422)
+++ Modules/posixmodule.c	(working copy)
@@ -2135,7 +2135,8 @@
 FILEFINDBUF3   ep;
 APIRET rc;
 
-if (!PyArg_ParseTuple(args, "t#:listdir", &name, &len))
+if (!PyArg_ParseTuple(args, "et#:listdir", 
+  Py_FileSystemDefaultEncoding, &name, &len))
 return NULL;
 if (len >= MAX_PATH) {
 		PyErr_SetString(PyExc_ValueError, "path too long");
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

On 10/13/07, Christian Heimes <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > - Why add an 'errors' argument to the function when it's a fatal error
> > to use it?
>
> I wanted the signature of the method be equal to the other methods
> PyUnicode_Decode*. I copied the FatalError from
> *_PyUnicode_AsDefaultEncodedString().

But that function is a terrible example; it was done that way because
an earlier version of the function *did* allow using the errors
argument and I wanted to make sure to catch all calls that were still
passing an errors value. This doesn't apply here, we're creating a
brand new API.

> > - Using 0 to autodetect the length is scary.  Normally we have two APIs
> > for that, one ..._FromString and one ...FromStringAndSize.  If you
> > really don't want that, please use -1, which is at least an illegal value.
>
> Oh right, -1 is *much* better for autodetect than 0. What do you prefer,
> a second method or -1 as auto detect?

Even better is Alexandre's version: always autodetect. I think we can
assume that filenames are always available as 0-terminated byte
arrays, since that's how all system calls need them.

Anyway, let me know if you want to change something in Alexandre's
version or if you want him to check it in.

Oh. Hm. I still wish that PyCode_New() could just insist that the
filename argument is a PyUnicode instance. Why can't it? Perhaps the
caller should be fixed instead?

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Christian Heimes

Christian Heimes added the comment:

Guido van Rossum wrote:
> But that function is a terrible example; it was done that way because
> an earlier version of the function *did* allow using the errors
> argument and I wanted to make sure to catch all calls that were still
> passing an errors value. This doesn't apply here, we're creating a
> brand new API.

Ahhh! I really didn't know it. I thought that the non functional
arguments have a purpose.

> Anyway, let me know if you want to change something in Alexandre's
> version or if you want him to check it in.

I'm going to create a new patch based on his fixes and your recommendations.

> Oh. Hm. I still wish that PyCode_New() could just insist that the
> filename argument is a PyUnicode instance. Why can't it? Perhaps the
> caller should be fixed instead?

I'll try.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

> > Oh. Hm. I still wish that PyCode_New() could just insist that the
> > filename argument is a PyUnicode instance. Why can't it? Perhaps the
> > caller should be fixed instead?

> I'll try.

I figured out the problem -- it came from marshalled old code objects. 
If you throw away all .pyc files the problem goes away.  You can also
get rid of the similar checks for the 'name' argument -- this should
just be a PyUnicode too.  A systematic approach to invalidating all the
.pyc files is updating the magic number in import.c.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Christian Heimes

Christian Heimes added the comment:

Guido van Rossum wrote:
> - Why copy the default encoding before mangling it?  With a little extra
> care you will only have to copy it once. 

Now I remember why I added the strncpy() call plus encoding[31] = '\0'.
I wanted to make sure that the code doesn't break even if the encoding
name is longer than 31 + 1 chars long. I'm aware that it's very unlikely
but I didn't want to take chances. You are allowed to call me paranoid. *g*

Christian

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

Well, you could ensure that by checking that you haven't reached the
end of the mangling buffer. That will have the added advantage that
when the input is something silly like 32 spaces followed by "utf-8"
it will be still be mangled correctly. The slight extra cost of the
check (could be a single pointer compare) is offset by saving a call
to strncpy().

--Guido

On 10/13/07, Christian Heimes <[EMAIL PROTECTED]> wrote:
>
> Christian Heimes added the comment:
>
> Guido van Rossum wrote:
> > - Why copy the default encoding before mangling it?  With a little extra
> > care you will only have to copy it once.
>
> Now I remember why I added the strncpy() call plus encoding[31] = '\0'.
> I wanted to make sure that the code doesn't break even if the encoding
> name is longer than 31 + 1 chars long. I'm aware that it's very unlikely
> but I didn't want to take chances. You are allowed to call me paranoid. *g*
>
> Christian
>
> __
> Tracker <[EMAIL PROTECTED]>
> 
> __
>

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1264] __file__ and co_filename as unicode

2007-10-13 Thread Guido van Rossum

Changes by Guido van Rossum:


--
resolution:  -> out of date
status: open -> closed
superseder:  -> Decode __file__ and co_filename to unicode using fs default

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1267] Py3K cannot run as ``python -S``

2007-10-13 Thread Guido van Rossum

Changes by Guido van Rossum:


--
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1268] array unittest problems with UCS4 build

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

Can this be closed now that Travis reverted his patch?

--
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1260] PEP 3137: Remove the buffer API from PyUnicode

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

You can check this in. You do have checkin privs right?

--
resolution:  -> accepted

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Guido wrote:
> I figured out the problem -- it came from marshalled old code objects. 
> If you throw away all .pyc files the problem goes away.  You can also
> get rid of the similar checks for the 'name' argument -- this should
> just be a PyUnicode too.  A systematic approach to invalidating all the
> .pyc files is updating the magic number in import.c.

Done.

I had to remove a few another PyString instances in pyexpat.c and
_ctypes.c. So, here (hopefully) the final version of the patch. 

The changes from the last version are:

   - Correct a typo in of the comments in PyUnicode_DecodeFSDefault
   - Specified in the API doc of PyUnicode_DecodeFSDefault that the
 function take a null-terminated string.
   - Bumped the magic number in import.c
   - Fix PyCode_New calls in _ctypes and pyexpat module.
   - Remove the PyString type check on 'filename' and 'name' in PyCode_New.
   - Remove the unneeded string coercion code from PyCode_New.

__
Tracker <[EMAIL PROTECTED]>

__Index: Python/ceval.c
===
--- Python/ceval.c	(revision 58422)
+++ Python/ceval.c	(working copy)
@@ -767,7 +767,7 @@
 	lltrace = PyDict_GetItemString(f->f_globals, "__lltrace__") != NULL;
 #endif
 #if defined(Py_DEBUG) || defined(LLTRACE)
-	filename = PyString_AsString(co->co_filename);
+	filename = PyUnicode_AsString(co->co_filename);
 #endif
 
 	why = WHY_NOT;
Index: Python/traceback.c
===
--- Python/traceback.c	(revision 58422)
+++ Python/traceback.c	(working copy)
@@ -229,10 +229,10 @@
 	while (tb != NULL && err == 0) {
 		if (depth <= limit) {
 			err = tb_displayline(f,
-			PyString_AsString(
+			PyUnicode_AsString(
 tb->tb_frame->f_code->co_filename),
 			tb->tb_lineno,
-			PyString_AsString(tb->tb_frame->f_code->co_name));
+			PyUnicode_AsString(tb->tb_frame->f_code->co_name));
 		}
 		depth--;
 		tb = tb->tb_next;
Index: Python/pythonrun.c
===
--- Python/pythonrun.c	(revision 58422)
+++ Python/pythonrun.c	(working copy)
@@ -867,7 +867,8 @@
 		return -1;
 	d = PyModule_GetDict(m);
 	if (PyDict_GetItemString(d, "__file__") == NULL) {
-		PyObject *f = PyString_FromString(filename);
+		PyObject *f;
+		f = PyUnicode_DecodeFSDefault(filename);
 		if (f == NULL)
 			return -1;
 		if (PyDict_SetItemString(d, "__file__", f) < 0) {
Index: Python/import.c
===
--- Python/import.c	(revision 58422)
+++ Python/import.c	(working copy)
@@ -74,10 +74,11 @@
 		  3040 (added signature annotations)
 		  3050 (print becomes a function)
 		  3060 (PEP 3115 metaclass syntax)
-  3070 (PEP 3109 raise changes)
+		  3070 (PEP 3109 raise changes)
+		  3080 (PEP 3137 make __file__ and __name__ unicode)
 .
 */
-#define MAGIC (3070 | ((long)'\r'<<16) | ((long)'\n'<<24))
+#define MAGIC (3080 | ((long)'\r'<<16) | ((long)'\n'<<24))
 
 /* Magic word as global; note that _PyImport_Init() can change the
value of this global to accommodate for alterations of how the
@@ -652,7 +653,7 @@
 	/* Remember the filename as the __file__ attribute */
 	v = NULL;
 	if (pathname != NULL) {
-		v = PyString_FromString(pathname);
+		v = PyUnicode_DecodeFSDefault(pathname);
 		if (v == NULL)
 			PyErr_Clear();
 	}
@@ -983,7 +984,7 @@
 		PySys_WriteStderr("import %s # directory %s\n",
 			name, pathname);
 	d = PyModule_GetDict(m);
-	file = PyString_FromString(pathname);
+	file = PyUnicode_DecodeFSDefault(pathname);
 	if (file == NULL)
 		goto error;
 	path = Py_BuildValue("[O]", file);
Index: Python/compile.c
===
--- Python/compile.c	(revision 58422)
+++ Python/compile.c	(working copy)
@@ -4001,7 +4001,7 @@
 	freevars = dict_keys_inorder(c->u->u_freevars, PyTuple_Size(cellvars));
 	if (!freevars)
 	goto error;
-	filename = PyString_FromString(c->c_filename);
+	filename = PyUnicode_DecodeFSDefault(c->c_filename);
 	if (!filename)
 		goto error;
 
Index: Python/importdl.c
===
--- Python/importdl.c	(revision 58422)
+++ Python/importdl.c	(working copy)
@@ -62,7 +62,9 @@
 		return NULL;
 	}
 	/* Remember the filename as the __file__ attribute */
-	if (PyModule_AddStringConstant(m, "__file__", pathname) < 0)
+	PyObject *path;
+	path = PyUnicode_DecodeFSDefault(pathname);
+	if (PyModule_AddObject(m, "__file__", path) < 0)
 		PyErr_Clear(); /* Not important enough to report */
 
 	if (_PyImport_FixupExtension(name, pathname) == NULL)
Index: Include/unicodeobject.h
===
--- Include/unicodeobject.h	(revision 58422)
+++ Include/unicodeobject.h	(working

[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

> Remove the PyString type check on 'filename' and 'name' in PyCode_New.

Oops. I removed one of the ! the checks by mistake.

__
Tracker <[EMAIL PROTECTED]>

__Index: Python/ceval.c
===
--- Python/ceval.c	(revision 58454)
+++ Python/ceval.c	(working copy)
@@ -767,7 +767,7 @@
 	lltrace = PyDict_GetItemString(f->f_globals, "__lltrace__") != NULL;
 #endif
 #if defined(Py_DEBUG) || defined(LLTRACE)
-	filename = PyString_AsString(co->co_filename);
+	filename = PyUnicode_AsString(co->co_filename);
 #endif
 
 	why = WHY_NOT;
Index: Python/traceback.c
===
--- Python/traceback.c	(revision 58454)
+++ Python/traceback.c	(working copy)
@@ -229,10 +229,10 @@
 	while (tb != NULL && err == 0) {
 		if (depth <= limit) {
 			err = tb_displayline(f,
-			PyString_AsString(
+			PyUnicode_AsString(
 tb->tb_frame->f_code->co_filename),
 			tb->tb_lineno,
-			PyString_AsString(tb->tb_frame->f_code->co_name));
+			PyUnicode_AsString(tb->tb_frame->f_code->co_name));
 		}
 		depth--;
 		tb = tb->tb_next;
Index: Python/pythonrun.c
===
--- Python/pythonrun.c	(revision 58454)
+++ Python/pythonrun.c	(working copy)
@@ -867,7 +867,8 @@
 		return -1;
 	d = PyModule_GetDict(m);
 	if (PyDict_GetItemString(d, "__file__") == NULL) {
-		PyObject *f = PyString_FromString(filename);
+		PyObject *f;
+		f = PyUnicode_DecodeFSDefault(filename);
 		if (f == NULL)
 			return -1;
 		if (PyDict_SetItemString(d, "__file__", f) < 0) {
Index: Python/import.c
===
--- Python/import.c	(revision 58454)
+++ Python/import.c	(working copy)
@@ -74,10 +74,11 @@
 		  3040 (added signature annotations)
 		  3050 (print becomes a function)
 		  3060 (PEP 3115 metaclass syntax)
-  3070 (PEP 3109 raise changes)
+		  3070 (PEP 3109 raise changes)
+		  3080 (PEP 3137 make __file__ and __name__ unicode)
 .
 */
-#define MAGIC (3070 | ((long)'\r'<<16) | ((long)'\n'<<24))
+#define MAGIC (3080 | ((long)'\r'<<16) | ((long)'\n'<<24))
 
 /* Magic word as global; note that _PyImport_Init() can change the
value of this global to accommodate for alterations of how the
@@ -652,7 +653,7 @@
 	/* Remember the filename as the __file__ attribute */
 	v = NULL;
 	if (pathname != NULL) {
-		v = PyString_FromString(pathname);
+		v = PyUnicode_DecodeFSDefault(pathname);
 		if (v == NULL)
 			PyErr_Clear();
 	}
@@ -983,7 +984,7 @@
 		PySys_WriteStderr("import %s # directory %s\n",
 			name, pathname);
 	d = PyModule_GetDict(m);
-	file = PyString_FromString(pathname);
+	file = PyUnicode_DecodeFSDefault(pathname);
 	if (file == NULL)
 		goto error;
 	path = Py_BuildValue("[O]", file);
Index: Python/compile.c
===
--- Python/compile.c	(revision 58454)
+++ Python/compile.c	(working copy)
@@ -4001,7 +4001,7 @@
 	freevars = dict_keys_inorder(c->u->u_freevars, PyTuple_Size(cellvars));
 	if (!freevars)
 	goto error;
-	filename = PyString_FromString(c->c_filename);
+	filename = PyUnicode_DecodeFSDefault(c->c_filename);
 	if (!filename)
 		goto error;
 
Index: Python/importdl.c
===
--- Python/importdl.c	(revision 58454)
+++ Python/importdl.c	(working copy)
@@ -62,7 +62,9 @@
 		return NULL;
 	}
 	/* Remember the filename as the __file__ attribute */
-	if (PyModule_AddStringConstant(m, "__file__", pathname) < 0)
+	PyObject *path;
+	path = PyUnicode_DecodeFSDefault(pathname);
+	if (PyModule_AddObject(m, "__file__", path) < 0)
 		PyErr_Clear(); /* Not important enough to report */
 
 	if (_PyImport_FixupExtension(name, pathname) == NULL)
Index: Include/unicodeobject.h
===
--- Include/unicodeobject.h	(revision 58454)
+++ Include/unicodeobject.h	(working copy)
@@ -154,6 +154,7 @@
 # define PyUnicode_DecodeASCII PyUnicodeUCS2_DecodeASCII
 # define PyUnicode_DecodeCharmap PyUnicodeUCS2_DecodeCharmap
 # define PyUnicode_DecodeLatin1 PyUnicodeUCS2_DecodeLatin1
+# define PyUnicode_DecodeFSDefault PyUnicodeUCS2_DecodeFSDefault
 # define PyUnicode_DecodeRawUnicodeEscape PyUnicodeUCS2_DecodeRawUnicodeEscape
 # define PyUnicode_DecodeUTF32 PyUnicodeUCS2_DecodeUTF32
 # define PyUnicode_DecodeUTF32Stateful PyUnicodeUCS2_DecodeUTF32Stateful
@@ -245,6 +246,7 @@
 # define PyUnicode_DecodeASCII PyUnicodeUCS4_DecodeASCII
 # define PyUnicode_DecodeCharmap PyUnicodeUCS4_DecodeCharmap
 # define PyUnicode_DecodeLatin1 PyUnicodeUCS4_DecodeLatin1
+# define PyUnicode_DecodeFSDefault PyUnicodeUCS4_DecodeFSDefault
 # define PyUnicode_DecodeRawUnicodeEscape PyUnicodeUCS4_DecodeRawUnicodeEscape

[issue1260] PEP 3137: Remove the buffer API from PyUnicode

2007-10-13 Thread Alexandre Vassalotti

Alexandre Vassalotti added the comment:

Committed in r58455.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1260] PEP 3137: Remove the buffer API from PyUnicode

2007-10-13 Thread Guido van Rossum

Changes by Guido van Rossum:


--
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Guido van Rossum

Guido van Rossum added the comment:

Crys, is this OK with you?

On 10/13/07, Alexandre Vassalotti <[EMAIL PROTECTED]> wrote:
>
> Alexandre Vassalotti added the comment:
>
> Guido wrote:
> > I figured out the problem -- it came from marshalled old code objects.
> > If you throw away all .pyc files the problem goes away.  You can also
> > get rid of the similar checks for the 'name' argument -- this should
> > just be a PyUnicode too.  A systematic approach to invalidating all the
> > .pyc files is updating the magic number in import.c.
>
> Done.
>
> I had to remove a few another PyString instances in pyexpat.c and
> _ctypes.c. So, here (hopefully) the final version of the patch.
>
> The changes from the last version are:
>
>- Correct a typo in of the comments in PyUnicode_DecodeFSDefault
>- Specified in the API doc of PyUnicode_DecodeFSDefault that the
>  function take a null-terminated string.
>- Bumped the magic number in import.c
>- Fix PyCode_New calls in _ctypes and pyexpat module.
>- Remove the PyString type check on 'filename' and 'name' in PyCode_New.
>- Remove the unneeded string coercion code from PyCode_New.
>
> __
> Tracker <[EMAIL PROTECTED]>
> 
> __
>

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1272] Decode __file__ and co_filename to unicode using fs default

2007-10-13 Thread Christian Heimes

Christian Heimes added the comment:

Guido van Rossum wrote:
> Guido van Rossum added the comment:
> 
> Crys, is this OK with you?

Alexandre's mangle loop doesn't do the same job as mine. Chars like _
and - aren't removed from the encoding name and the if clauses don't
catch for example UTF-8 or ISO-8859-1 only UTF8 or ISO8859-1. Also he
has overseen a PyString_Check in the code repr function.

I'm working on a better mangler and I believe that I can make
Py_FilesystemEncoding available much earlier in Py_InitializeEx().

*after some debugging*

I fear that we are on the wrong path. Py_FileSystemEncoding is set
*much* later in the boot strapping process unless its value is hard
coded (Win32 and Apple only). Any attempt to get the right codec or even
a normalized name without the codecs package is going to extremely hard.

We have to get the codecs up and Py_FileSystemEncoding before we can
decode the filenames. :( I think that the problem needs much more
attention and a proper fix.

Christian

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com