Leo Kislov wrote: > Ron Adam wrote: > >> locale.setlocale(locale.LC_ALL, '') # use current locale settings > > It's not current locale settings, it's user's locale settings. > Application can actually use something else and you will overwrite > that. You can also affect (unexpectedly to the application) > time.strftime() and C extensions. So you should move this call into the > _test() function and put explanation into the documentation that > application should call locale.setlocale
I'll experiment with this a bit, I was under the impression that local.strxfrm needed the locale set for it to work correctly. Maybe it would be better to have two (or more) versions? A string, unicode, and locale version or maybe add an option to __init__ to choose the behavior? Multiple versions seems to be the approach of pre-py3k. Although I was trying to avoid that. Sigh, of course issues like this is why it is better to have a module to do this with. If it was as simple as just calling sort() I wouldn't have bothered. ;-) >> self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE) > > [snip] > >> if NUMERICAL in self.flags: >> slist = self.numrex.split(s) >> for i, x in enumerate(slist): >> try: >> slist[i] = float(x) >> except: >> slist[i] = locale.strxfrm(x) > > I think you should call locale.atof instead of float, since you call > re.compile with re.LOCALE. I think you are correct, but it seems locale.atof() is a *lot* slower than float(). :( Here's the local.atof() code. def atof(string,func=float): "Parses a string as a float according to the locale settings." #First, get rid of the grouping ts = localeconv()['thousands_sep'] if ts: string = string.replace(ts, '') #next, replace the decimal point with a dot dd = localeconv()['decimal_point'] if dd: string = string.replace(dd, '.') #finally, parse the string return func(string) I could set ts and dd in __init__ and just do the replacements in the try... if NUMERICAL in self.flags: slist = self.numrex.split(s) for i, x in enumerate(slist): if x: # slist may contain null strings if self.ts: xx = x.replace(self.ts, '') # remove thousands sep if self.dd: xx = xx.replace(self.dd, '.') # replace decimal point try: slist[i] = float(xx) except: slist[i] = locale.strxfrm(x) How does that look? It needs a fast way to determine if x is a number or a string. Any suggestions? > Everything else looks fine. The biggest missing piece is support for > unicode strings. This was the reason for using locale.strxfrm. It should let it work with unicode strings from what I could figure out from the documents. Am I missing something? Thanks, Ron -- http://mail.python.org/mailman/listinfo/python-list