Hello, at first i must beg the pardon of those from you, whose mailboxes got flooded by my last announcement of depikt. I myself get no emails from this list, and when i had done my corrections and posted each of the sligthly improved versions, i wasn't aware of the extra emails that produces. Sorry !
I read here recently, that some reagard Python3 worse at encoding issues than former versions. For me, a German, quite the contrary is true. The automatic conversion without an Exception from before 3 has caused pain over pain during the last years. Even some weeks before it happened, that pygtk suddenly returned utf-8, not unicode, and my software had delivered a lot of muddled automatically written emails, before i saw the mess. Python 3 would have raised Exceptions - however the translation of my software to 3 has just begun. Now there is a concept of two separated worlds, and i have decided to use bytes for my software. The string representation, that output needs anyway, and with depikt and a changed apsw (file reads anyway) or other database-APIs (internally they all understand utf-8) i can get utf-8 for all input too. This means, that i do not have the standard string methods, but substitutes are easily made. Not for a subclass of bytes, that wouldn't have the b"...." initialization. Thus only in form of functions. Here are some of my utools: u0 = "".encode('utf-8') def u(s): if type(s) in (int, float, type): s = str(s) if type(s) == str: return s.encode("utf-8") if type(s) == bytes: # we keep the two worlds cleanly separated raise TypeError(b"argument is bytes already") raise TypeError(b"Bad argument for utf-encoding") def u_startswith(s, test): try: if s.index(test) == 0: return True except: # a bit frisky perhaps return False def u_endswith(s, test): if s[-len(test):] == test: return True return False def u_split(s, splitter): ret = [] while s and splitter in s: if u_startswith(s, splitter): s = s[len(splitter):]; continue ret += s[:s.index[splitter]] return ret + [s] def u_join(joiner, l): while True: if len(l) in (0,1): return l else: l = [l[0]+joiner+l[1]]+l[2:] (not all with the standard signatures). Writing them is trivial. Note u0 - unfortunately b"" doesn't at all work as expected, i had to learn the hard way. Looking more close to these functions one sees, that they only use the sequence protocol. "index" is in the sequence protocol too now - there the library reference has still to be updated. Thus all of these and much more string methods could get to the sequence protocol too without much work - then nobody would have to write all this. This doesn't only affect string-like objects: split and join for lists could open interesting possibilities for list representations of trees for example. Does anybody want to make a PEP from this (i won't do so) ? Joost Behrends -- http://mail.python.org/mailman/listinfo/python-list