Daiyue Weng wrote: > Hi, when I read a file, the file string contains Mojibake chars at the > beginning, the code is like, > > file_str = open(file_path, 'r', encoding='utf-8').read() > print(repr(open(file_path, 'r', encoding='utf-8').read()) > > part of the string (been printing) containing Mojibake chars is like, > > '锘縶\n "name": "__NAME__"' > > I tried to remove the non utf-8 chars using the code, > > def read_config_file(fname): > with open(fname, "r", encoding='utf-8') as fp: > for line in fp: > line = line.strip() > line = line.decode('utf-8','ignore').encode("utf-8") > > return fp.read() > > but it doesn't work, so how to remove the Mojibakes in this case?
I'd first investigate if the file can correctly be decoded using an encoding other than UTF-8, but if it's really hopeless and your best bet is to ignore all non-ascii characters try def read_config_file(fname): with open(fname, "r", encoding="ascii", errors="ignore") as f: return f.read() -- https://mail.python.org/mailman/listinfo/python-list