Pure python implementation of string-like class
Hi all. I would like to ask how I can implement string-like class using tuple or list. Does anyone know about some example codes of pure python implementation of string-like class? Because I am trying to use Python for a text processing which is composed of a large character set. As the character set is wider than UTF-16(U+10), I can't use Python's native unicode string class. So I want to prepare my own string class, which provides convenience string methods such as split, join, find and others like usual string class, but it uses a sequence of integer as a internal representation instead of a native string. Obviously, subclassing of str doesn't help. The implementation of each string methods in the Python source tree(stringobject.c) is far from python code, so I have started from scratch, like below: def startswith(self, prefix, start=-1, end=-1): assert start < 0, "not implemented" assert end < 0, "not implemented" if isinstance(prefix, (str, unicode)): prefix = MyString(prefix) n = len(prefix) return self[0:n] == prefix but I found it's not a trivial task for myself to achive correctness and completeness. It smells "reinventing the wheel" also, though I can't find any hints in google and/or Python cookbook. I don't care efficiency as a starting point. Any comments are welcome. Thanks. -- kayama -- http://mail.python.org/mailman/listinfo/python-list
Re: Pure python implementation of string-like class
Hi bearophile. In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes: bearophileHUGS> Maybe you can create your class using an array of 'L' with the array bearophileHUGS> standard module. Thanks for your suggestion. I'm currently using an usual list as a internal representation. According to my understanding, as compared to list, array module offers efficiency but no convenient function to implement various string methods. As Python's list is already enough fast, I want to speed up my coding work first. -- kayama -- http://mail.python.org/mailman/listinfo/python-list
Re: Pure python implementation of string-like class
Hi And. In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes: and-google> Akihiro KAYAMA wrote: and-google> > As the character set is wider than UTF-16(U+10), I can't use and-google> > Python's native unicode string class. and-google> and-google> Have you tried using Python compiled in Wide Unicode mode and-google> (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then, and-google> which should be enough for most purposes. >From my quick survey, Python's Unicode support is restricted to UTF-16 range(U+...U+10) intentionally, regardless of --enable-unicode=ucs4 option. > Python 2.4.1 (#2, Sep 3 2005, 22:35:47) > [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4 > Type "help", "copyright", "credits" or "license" for more information. > >>> u"\U0010" > u'\U0010' > >>> len(u"\U0010") > 1 > >>> u"\U0011" > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: > illegal Unicode character Simple patch to unicodeobject.c which disables unicode range checking could solve this, but I don't want to maintenance specialized Python binary for my project. -- kayama -- http://mail.python.org/mailman/listinfo/python-list
Re: Pure python implementation of string-like class
Hi Steve. In article <[EMAIL PROTECTED]>, Steve Holden <[EMAIL PROTECTED]> writes: steve> Akihiro KAYAMA wrote: steve> > Hi all. steve> > steve> > I would like to ask how I can implement string-like class using tuple steve> > or list. Does anyone know about some example codes of pure python steve> > implementation of string-like class? steve> > steve> > Because I am trying to use Python for a text processing which is steve> > composed of a large character set. As the character set is wider than steve> > UTF-16(U+10), I can't use Python's native unicode string class. steve> > steve> "Wider than UTF-16" doesn't make sense. Sorry for my terrible English. I am living in Japan, and we have a large number of characters called Kanji. UTF-16(U+...U+10) is enough for practical use in this country also, but for academic purpose, I need a large codespace over 20-bits. I wish I could use unicode's private space (U+6000...U+7FFF) in Python. -- kayama -- http://mail.python.org/mailman/listinfo/python-list
Re: Pure python implementation of string-like class
Hi Ross. Thanks a lot for your clarifying. I didn't think my post could be an Unicode frame. I don't know this mailing list is the right place talking about Unicode issue, but as for me, a million codespace which UTF-16 brings is not enough. It presume that same characters has a same codepoint. But differs from the simple and beauty Roman Alphabet, it is sometimes difficult to decide two kanji characters are "same" or not. Because its glyph swings with various reason(ex. who, when and where it's wrote). So first of all we assign codepoints, and next we consider that "this character which appears in this Chinese historical book may be the same character as this character in Unicode CJK Extension A". Such an identifying characters is also one of my project's tasks. I think this can be explanation why UTF-16 is enough for majority but not for all. Anyway, I suppose that implementing string-like classes is a generic python issue. For example, it will be useful if a rich text class which has style attributes like bold on each characters has also string-like methods and can be dealt with like a string. In article <[EMAIL PROTECTED]>, "Ross Ridge" <[EMAIL PROTECTED]> writes: rridge> thiking about it, it might actually make sense to use strings as the rridge> internal representation as a lot operations can be implemented by using rridge> the standard string operation but multipling the offsets and lengths by rridge> 4. Ah, COOL! It sounds very nice. I'll try it. Thanks again. -- kayama -- http://mail.python.org/mailman/listinfo/python-list
fiber(cooperative multi-threading)
Hi all. I found cooperative multi-threading(only one thread runs at once, explicit thread switching) is useful for writing some simulators. With it, I'm able to be free from annoying mutual exclusion, and make results deterministic. For this purpose, and inspired by Ruby(1.9) fiber, I wrote my own version of fiber in Python. It just works, but using native Python threads for non-preemptive threading is not cost-effective. Python has generator instead but it seemed to be very restricted for general scripting. I wish I could write nested (generator) functions easily at least. Is there any plan of implementing real (lightweight) fiber in Python? import threading class Fiber(threading.Thread): def __init__(self): threading.Thread.__init__(self) self.semaphore_running = threading.Semaphore(0) self.semaphore_finish = None self.val = None self.setDaemon(True) self.start() self.start = self.start_fiber def start_fiber(self): self.semaphore_finish = threading.Semaphore(0) self.semaphore_running.release() self.semaphore_finish.acquire() def run(self): # override self.semaphore_running.acquire() self.main() if self.semaphore_finish is not None: self.semaphore_finish.release() def switchto(self, fiber, val=None): fiber.val = val fiber.semaphore_running.release() self.semaphore_running.acquire() return self.val def main(self): # should be overridden pass class F1(Fiber): def main(self): print "f1 start" self.switchto(f2) print "f1 foo" v = self.switchto(f2) print "f1 v=%s world" % v self.switchto(f2, "OK") print "f1 end" class F2(Fiber): def main(self): print "f2 start" self.switchto(f1) print "f2 bar" result = self.switchto(f1, "Hello, ") print "f2 result=%s" % result print "f2 end" self.switchto(f1) f1 = F1() f2 = F2() print "start" f1.start() print "end" -- kayama -- http://mail.python.org/mailman/listinfo/python-list
Re: fiber(cooperative multi-threading)
Thanks for your replies. In article <[EMAIL PROTECTED]>, Arnaud Delobelle <[EMAIL PROTECTED]> writes: arnodel> def f1(): arnodel> print "f1 start" arnodel> yield f2, arnodel> print "f1 foo" arnodel> v = yield f2, arnodel> print "f1 v=%s world" % v arnodel> yield f2, "OK" arnodel> print "f1 end" arnodel> arnodel> def f2(): arnodel> print "f2 start" arnodel> yield f1, arnodel> print "f2 bar" arnodel> result = yield f1, "Hello, " arnodel> print "f2 result=%s" % result arnodel> print "f2 end" arnodel> yield f1, This is the most simple example. In real programming, things are more complicate so I will want to refactor it like below: def foo(fiber, s, arg=None) print s return yield fiber, arg def f1(): foo(f2, "start")# XXX returns generator object v = foo(f2, "foo") foo(f2, "v=%s world" % v, "OK") But current Python generator specification requires me: def f1(): for x in foo(f2, "foo"): yield x for x in foo(f2, "foo"): yield x # XXX v = ... (I don't know how to do this) for x in foo(f2, "v=%s world" % v, "OK"): yield x I think it is not straitforward. Single level function which generator impose is impractical for real use. In article <[EMAIL PROTECTED]>, Duncan Booth <[EMAIL PROTECTED]> writes: duncan.booth> Unfortunately generators only save a single level of stack-frame, so they duncan.booth> are not really a replacement for fibers/coroutines. The OP should perhaps duncan.booth> look at Stackless Python or Greenlets. See duncan.booth> http://codespeak.net/py/dist/greenlet.html I am happy if I could use convenient coroutine features via standard or simple extension library. py.magic.greenlet may be what I'm looking for, but I wonder why this is named "magic" :-) -- kayama -- http://mail.python.org/mailman/listinfo/python-list