Akihiro KAYAMA wrote: > Hi all. > > I would like to ask how I can implement string-like class using tuple > or list. Does anyone know about some example codes of pure python > implementation of string-like class? > > Because I am trying to use Python for a text processing which is > composed of a large character set. As the character set is wider than > UTF-16(U+10FFFF), I can't use Python's native unicode string class. > "Wider than UTF-16" doesn't make sense.
> So I want to prepare my own string class, which provides convenience > string methods such as split, join, find and others like usual string > class, but it uses a sequence of integer as a internal representation > instead of a native string. Obviously, subclassing of str doesn't > help. > > The implementation of each string methods in the Python source > tree(stringobject.c) is far from python code, so I have started from > scratch, like below: > > def startswith(self, prefix, start=-1, end=-1): > assert start < 0, "not implemented" > assert end < 0, "not implemented" > if isinstance(prefix, (str, unicode)): > prefix = MyString(prefix) > n = len(prefix) > return self[0:n] == prefix > > but I found it's not a trivial task for myself to achive correctness > and completeness. It smells "reinventing the wheel" also, though I > can't find any hints in google and/or Python cookbook. > > I don't care efficiency as a starting point. Any comments are welcome. > Thanks. > The UTF-16 encoding is capable of representing the whole of Unicode. There should be no need to do anything special to use UTF-16. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list