Pure python implementation of string-like class

2006-02-24 Thread Akihiro KAYAMA

Hi all.

I would like to ask how I can implement string-like class using tuple
or list. Does anyone know about some example codes of pure python
implementation of string-like class?

Because I am trying to use Python for a text processing which is
composed of a large character set. As the character set is wider than
UTF-16(U+10), I can't use Python's native unicode string class.

So I want to prepare my own string class, which provides convenience
string methods such as split, join, find and others like usual string
class, but it uses a sequence of integer as a internal representation
instead of a native string.  Obviously, subclassing of str doesn't
help.

The implementation of each string methods in the Python source
tree(stringobject.c) is far from python code, so I have started from
scratch, like below:

def startswith(self, prefix, start=-1, end=-1):
assert start < 0, "not implemented"
assert end < 0, "not implemented"
if isinstance(prefix, (str, unicode)):
prefix = MyString(prefix)
n = len(prefix)
return self[0:n] == prefix

but I found it's not a trivial task for myself to achive correctness
and completeness. It smells "reinventing the wheel" also, though I
can't find any hints in google and/or Python cookbook.

I don't care efficiency as a starting point. Any comments are welcome.
Thanks.

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pure python implementation of string-like class

2006-02-25 Thread Akihiro KAYAMA
Hi bearophile.

In article <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] writes:

bearophileHUGS> Maybe you can create your class using an array of 'L' with the 
array
bearophileHUGS> standard module.

Thanks for your suggestion. I'm currently using an usual list as a
internal representation. According to my understanding, as compared to
list, array module offers efficiency but no convenient function to
implement various string methods. As Python's list is already enough
fast, I want to speed up my coding work first.

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pure python implementation of string-like class

2006-02-25 Thread Akihiro KAYAMA
Hi And.

In article <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] writes:

and-google> Akihiro KAYAMA wrote:
and-google> > As the character set is wider than UTF-16(U+10), I can't use
and-google> > Python's native unicode string class.
and-google> 
and-google> Have you tried using Python compiled in Wide Unicode mode
and-google> (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then,
and-google> which should be enough for most purposes.

>From my quick survey, Python's Unicode support is restricted to
UTF-16 range(U+...U+10) intentionally, regardless of
--enable-unicode=ucs4 option. 

> Python 2.4.1 (#2, Sep  3 2005, 22:35:47) 
> [GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
> Type "help", "copyright", "credits" or "license" for more information.
> >>> u"\U0010"
> u'\U0010'
> >>> len(u"\U0010")
> 1
> >>> u"\U0011"
> UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: 
> illegal Unicode character

Simple patch to unicodeobject.c which disables unicode range checking
could solve this, but I don't want to maintenance specialized Python
binary for my project.

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pure python implementation of string-like class

2006-02-25 Thread Akihiro KAYAMA
Hi Steve.

In article <[EMAIL PROTECTED]>,
Steve Holden <[EMAIL PROTECTED]> writes:

steve> Akihiro KAYAMA wrote:
steve> > Hi all.
steve> > 
steve> > I would like to ask how I can implement string-like class using tuple
steve> > or list. Does anyone know about some example codes of pure python
steve> > implementation of string-like class?
steve> > 
steve> > Because I am trying to use Python for a text processing which is
steve> > composed of a large character set. As the character set is wider than
steve> > UTF-16(U+10), I can't use Python's native unicode string class.
steve> > 
steve> "Wider than UTF-16" doesn't make sense.

Sorry for my terrible English. I am living in Japan, and we have a
large number of characters called Kanji. UTF-16(U+...U+10) is
enough for practical use in this country also, but for academic
purpose, I need a large codespace over 20-bits. I wish I could use
unicode's private space (U+6000...U+7FFF) in Python.

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pure python implementation of string-like class

2006-02-26 Thread Akihiro KAYAMA

Hi Ross. 

Thanks a lot for your clarifying. I didn't think my post could be an
Unicode frame. 

I don't know this mailing list is the right place talking about
Unicode issue, but as for me, a million codespace which UTF-16 brings
is not enough. It presume that same characters has a same codepoint.
But differs from the simple and beauty Roman Alphabet, it is sometimes
difficult to decide two kanji characters are "same" or not. Because
its glyph swings with various reason(ex. who, when and where it's
wrote). So first of all we assign codepoints, and next we consider
that "this character which appears in this Chinese historical book may
be the same character as this character in Unicode CJK Extension
A". Such an identifying characters is also one of my project's tasks.
I think this can be explanation why UTF-16 is enough for majority but
not for all.

Anyway, I suppose that implementing string-like classes is a generic
python issue. For example, it will be useful if a rich text class
which has style attributes like bold on each characters has also
string-like methods and can be dealt with like a string.

In article <[EMAIL PROTECTED]>,
"Ross Ridge" <[EMAIL PROTECTED]> writes:

rridge> thiking about it, it might actually make sense to use strings as the
rridge> internal representation as a lot operations can be implemented by using
rridge> the standard string operation but multipling the offsets and lengths by
rridge> 4.

Ah, COOL! It sounds very nice. I'll try it.
Thanks again.

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


fiber(cooperative multi-threading)

2007-12-22 Thread Akihiro KAYAMA

Hi all.

I found cooperative multi-threading(only one thread runs at once,
explicit thread switching) is useful for writing some simulators.
With it, I'm able to be free from annoying mutual exclusion, and make
results deterministic.

For this purpose, and inspired by Ruby(1.9) fiber, I wrote my own
version of fiber in Python.

It just works, but using native Python threads for non-preemptive
threading is not cost-effective. Python has generator instead but it
seemed to be very restricted for general scripting. I wish I could
write nested (generator) functions easily at least.

Is there any plan of implementing real (lightweight) fiber in Python?


import threading

class Fiber(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)

self.semaphore_running = threading.Semaphore(0)
self.semaphore_finish = None
self.val = None

self.setDaemon(True)
self.start()
self.start = self.start_fiber

def start_fiber(self):
self.semaphore_finish = threading.Semaphore(0)
self.semaphore_running.release()
self.semaphore_finish.acquire()

def run(self):  # override
self.semaphore_running.acquire()
self.main()
if self.semaphore_finish is not None:
self.semaphore_finish.release()

def switchto(self, fiber, val=None):
fiber.val = val
fiber.semaphore_running.release()
self.semaphore_running.acquire()
return self.val

def main(self): # should be overridden
pass

class F1(Fiber):
def main(self):
print "f1 start"
self.switchto(f2)
print "f1 foo"
v = self.switchto(f2)
print "f1 v=%s world" % v
self.switchto(f2, "OK")
print "f1 end"

class F2(Fiber):
def main(self):
print "f2 start"
self.switchto(f1)
print "f2 bar"
result = self.switchto(f1, "Hello, ")
print "f2 result=%s" % result
print "f2 end"
self.switchto(f1)

f1 = F1()
f2 = F2()

print "start"
f1.start()
print "end"

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fiber(cooperative multi-threading)

2007-12-23 Thread Akihiro KAYAMA

Thanks for your replies.

In article <[EMAIL PROTECTED]>,
Arnaud Delobelle <[EMAIL PROTECTED]> writes:

arnodel> def f1():
arnodel> print "f1 start"
arnodel> yield f2,
arnodel> print "f1 foo"
arnodel> v = yield f2,
arnodel> print "f1 v=%s world" % v
arnodel> yield f2, "OK"
arnodel> print "f1 end"
arnodel> 
arnodel> def f2():
arnodel> print "f2 start"
arnodel> yield f1,
arnodel> print "f2 bar"
arnodel> result = yield f1, "Hello, "
arnodel> print "f2 result=%s" % result
arnodel> print "f2 end"
arnodel> yield f1,

This is the most simple example. In real programming, things are more
complicate so I will want to refactor it like below:

def foo(fiber, s, arg=None)
print s
return yield fiber, arg

def f1():
foo(f2, "start")# XXX returns generator object
v = foo(f2, "foo")
foo(f2, "v=%s world" % v, "OK")

But current Python generator specification requires me:

def f1():
for x in foo(f2, "foo"): yield x
for x in foo(f2, "foo"): yield x
# XXX v = ... (I don't know how to do this)
for x in foo(f2, "v=%s world" % v, "OK"): yield x

I think it is not straitforward. Single level function which generator
impose is impractical for real use.

In article <[EMAIL PROTECTED]>,
Duncan Booth <[EMAIL PROTECTED]> writes:

duncan.booth> Unfortunately generators only save a single level of stack-frame, 
so they 
duncan.booth> are not really a replacement for fibers/coroutines. The OP should 
perhaps 
duncan.booth> look at Stackless Python or Greenlets. See 
duncan.booth> http://codespeak.net/py/dist/greenlet.html

I am happy if I could use convenient coroutine features via standard
or simple extension library.  py.magic.greenlet may be what I'm
looking for, but I wonder why this is named "magic" :-)

-- kayama
-- 
http://mail.python.org/mailman/listinfo/python-list