Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
On Sat, Dec 4, 2010 at 3:11 PM, Mark Dickinson <rep...@bugs.python.org> wrote: > > Mark Dickinson <dicki...@gmail.com> added the comment: >.. One issue is that we'd still need the char* -> double operations, partly >because > PyOS_string_to_double is part of the public API, and partly to continue to > support > creation of a float from a bytes instance. > I thought about it. I see two solutions: 1. Retain PyOS_string_to_double unchanged and add PyOS_unicode_to_double. 2. Replace PyOS_string_to_double with UTF-8 decode result passed to PyOS_unicode_to_double. > The other issue is that for floats, it's difficult to separate the parser > from the base > conversion; to be useful, we'd probably end up making the whole of dtoa.c > Py_UNICODE aware. That's what I had in mind. Naively it looks like we just need to replace char type with Py_UNICODE in several places. Assuming exotic digit conversion is still handled separately. > (One of the return values from the dtoa.c parser is a pointer to the > significant digits > in the original input string; so the base-conversion calculation itself > needs access > to portions of the original string.) > Maybe we should start with int(). It is simpler, but probably reveal some of the same difficulties as float() > Ideally, for float(string), we'd have a zero-copy setup that operated > directly on the > unicode input (read-only); but I think that achieving that right now is > going to be > messy, and involve dtoa.c knowing far more about Unicode that I'd be > comfortable > with. > This is clearly a 3.3-ish project. Hopefully in time people will realize that decimal digits are just [0-9] and numeric experts will not be required to know about Unicode beyond 127th code point. :-) > N.B. If we didn't have to deal with alternative digits, it *really* would be > much simpler. > We still don't. I've already separated this out and we can keep it this way as long as people are willing to pay the price for alternative digits' support. One thing we may improve, is to fail earlier on non-digits in PyUnicode_TransformDecimalToASCII() to speedup not uncommon code like this: for line in f: try: n = int(lint) except ValueError: pass ... > Perhaps a compromise option is available, that does a preliminary pass on the > Unicode string and only makes a copy if non-European digits are discovered. Hmm. That would require changing the signature of PyUnicode_TransformDecimalToASCII() to take PyObject* instead of the buffer. I knew we shouldn't have rushed to make it public. We can still do it in longobject.c and friends' boilerplate. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10557> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com