Hi all, I am looking for advice on how to use unicode with curses. First I will explain my understanding of how curses deals with keyboard input and how it differs with what I would like.
The curses module has a window.getch() function to capture keyboard input. This function returns an integer which is more or less: * a byte if the key which was pressed is a printable character (e.g. a, F, &); * an integer > 255 if it is a special key, e.g. if you press KEY_UP it returns 259. As far as I know, curses is totally unicode unaware, so if the key pressed is printable but not ASCII, the getch() function will return one or more bytes depending on the encoding in the terminal. E.g. given utf-8 encoding, if I press the key 'é' on my keyboard (which encoded as '\xc3\xa9' in utf-8), I will need two calls to getch() to get this: the first one will return 0xC3 and the second one 0xA9. Instead of getting a stream of bytes and special keycodes (with value > 255) from getch(), what I want is a stream of *unicode characters* and special keycodes. So, still assuming utf-8 encoding in the terminal, if I type: Té[KEY_UP]ça iterating call to the getch() function will give me this sequence of integers: 84, 195, 169, 259, 195, 167, 97 T- é------- KEY_UP ç------- a- But what I want to get this stream instead: u'T', u'é', 259, u'ç', u'a' I can pipe the stream of output from getch() directly through an instance of codecs.getreader('utf-8') because getch() sometimes returns the integer values of the 'special keys'. Now I will present to you the solution I have come up with so far. I am really unsure whether it is a good way to solve this problem as both unicode and curses still feel quite mysterious to me. What I would appreciate is some advice on how to do it better - or someone to point out that I have a gross misunderstanding of what is going on! This has been tested in Python 2.5 -------------------- uctest.py ------------------------------ # -*- coding:utf-8 -*- import codecs import curses # This gives the return codes given by curses.window.getch() when # "Té[KEY_UP]ça" is typed in a terminal with utf-8 encoding: codes = map(ord, "Té") + [curses.KEY_UP] + map(ord, "ça") # This class defines a file-like object from a curses window 'win' # whose read() function will return the next byte (as a character) # given by win.getch() if it's a byte or return the empty string and # set the code attribute to the value of win.getch(). # It is not used in this test, The Stream class below is used # instead. class CursesStream(object): def __init__(self, win): self.getch = self.win.getch def read(self): c = self.getch() if c == -1: self.code = None return '' elif c > 255: self.code = c return '' else: return chr(c) # This class simulates CursesStream above with a predefined list of # keycodes to return - handy for testing. class Stream(object): def __init__(self, codes): self.codes = iter(codes) def read(self): try: c = self.codes.next() except StopIteration: self.code = None return '' if c > 255: self.code = c return '' else: return chr(c) def getkeys(stream, encoding): '''Given a CursesStream object and an encoding, yield the decoded unicode characters and special keycodes that curses sends''' read = codecs.getreader(encoding)(stream).read while True: c = read() if c: yield c elif stream.code is None: return else: yield stream.code # Test getkeys with for c in getkeys(Stream(codes), 'utf-8'): if isinstance(c, unicode): print 'Char\t', c else: print 'Code\t', c -------------------- running uctest.py ------------------------------ marigold:junk arno$ python uctest.py Char T Char é Code 259 Char ç Char a Thanks if you have read this far! -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list