Steven, thanks! Very nice algorithm.
Here is code:
#!/usr/bin/env python
# -*- coding: utf_8 -*-
# Thanks Steven D'Aprano for hints
import unicodedata
import MySQLdb
#MySQL variables
mysql_host = "localhost"
mysql_user = "dict"
mysql_password = "passwd"
mysql_db = "dictionary"
try:
my
Hi.
Essence of problem in the following:
Here is lines in utf8 of this form "BZ?ツーリTV%ツキDVD"
Is it possible to split them into the fragments that contain only latin
printable symbols (aplhabet + "?#" etc)
and fragments with the hieroglyphs, so it could be like this
['BZ?', '\xe3\x83\x84\xe3\x83\xbc