paul rubin <[EMAIL PROTECTED]> added the comment: I'm not sure what you mean by "ditto for Lucene indexes". I wasn't planning to use C code. I was hoping to write Python code to parse those indexes, then found they use this weird encoding, and Python's codec set is fairly inclusive already, so this codec sounded like a reasonably useful addition. It probably shows up other places as well. It might even be a reasonable internal representation for Python, which as I understand it currently can't represent codepoints outside the BMP. Also, it is used in Java serialization, which I think of as a somewhat weird and whacky thing, but it's conceivable that somebody someday might want to write a Python program that speaks the Java serialization protocol (I don't have a good sense of whether that's feasible).
Writing an application specific codec with the C API is doable in principle, but it seems like an awful lot of effort for just one quickie program. These indexes are very large and so writing the codec in Python would probably be painfully slow. __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2857> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com