"Vyz" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Its a module to transliterate from telugu language written in roman > script into native unicode. right now its running in a browser window > at www.lekhini.org I Intend to translate it into python so that I can > integrate into other tools I have. I am able to pass arguments and get > output from the script also would be OK. or how about ways to wrap > these javascript functions with python.
Leaving aside the code the manipulated the display and user interaction, the code should be pretty straightforward logic (if-else statements) and table lookups, so translation to Python should be straightforward also. I checked parser.js. I don't know javascript but it looks to me like a mixture of C and Python. The for loop headers have to be rewritten, and the switch changed to if-elif. What looks different is the attachment as attributes of method functions to functions rather than classes. As for 'wrapping': can you get a standard javascript interpreter? If so, you could possibly adjust the js so you can pipe a roman string to the js program and have it pipe back the telegu unicode version. >> > I have a script with hundreds of lines of javascript spread accross 7 >> > files. Is there any tool out there to automatically or >> > semi-automatically translate the code into python. unicode.js is mostly a few hundred verbose lines like Unicode.codePoints[Padma.lang_TELUGU].letter_PHA = "\u0C2B"; that setup the translation dict. Because the object model is different, I suspect that these all need to be changed, but, I also suspect, in a mechanical way. If one were starting in Python, one might either just define a dict more compactly like TEL_uni = {letter_PHA:"\u0C2B", ...} *or* probably better, use the builtin unicodedata module as much as possible. >>> import unicodedata as u >>> pha = u.name(u'\u0c2b') >>> pha 'TELUGU LETTER PHA' >>> u.lookup(pha) u'\u0c2b' I don't know what you do with js statement like this: Unicode.toPadma[Unicode.codePoints[Padma.lang_TELUGU].misc_VIRAMA + Unicode.codePoints[Padma.lang_TELUGU].letter_KA] = Padma.vattu_KA; where a constant seems to be assigned to a sum. But whatever these do might correspond to the u.normalize function. This appears to be based on a generic Indian-script transliteration program (Padma), so there may be functions not really needed for Telegu. (I am familiar with Devanagri but know nothing of Telegu and its script except that it is Dravidian rather than Indo-European-Sanskritic.) Good luck. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list