Hello. At Mon, 18 Mar 2019 14:13:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20190318.141334.186469242.horiguchi.kyot...@lab.ntt.co.jp> > Hello. > > At Sun, 17 Mar 2019 20:23:05 -0400, Hugh Ranalli <h...@whtc.ca> wrote in > <caahbumnoblu7jabyk5mk0lxeyt03pznqt_apkg0z9bsajcl...@mail.gmail.com> > > Hi Ram, > > Thanks for doing this; I've been overestimating my ability to get to things > > over the last couple of weeks. > > > > I've looked at the patch and have made one minor change. I had moved all > > the imports up to the top, to keep them in one place (and I think some had > > originally been used only by the Python 2 code. You added them there, but > > didn't remove them from their original positions. So I've incorporated that > > into your patch, attached as v2. I've tested this under Python 2 and 3 on > > Linux, not Windows. > > Though I'm not sure the necessity of running the script on > Windows, the problem is not specific for Windows, but general one > that haven't accidentially found on non-Windows environment. > > On CentOS7: > > export LANG="ja_JP.EUCJP" > > python <..snipped..> > .. > > UnicodeEncodeError: 'euc_jp' codec can't encode character '\xab' in > > position 0: illegal multibyte sequence > > So this is not an issue with Windows but with python3. > > The script generates identical files with the both versions of > python with the pach on Linux and Windows 7. Python3 on Windows > emits CRLF as a new line but it doesn't seem to harm. (I didn't > confirmed that due to extreme slowness of build from uncertain > reasons now..)
I confirmed that CRLF actually doesn't harm and unaccent works correctly. (t_isspace() excludes them as white space). > This patch contains irrelevant changes. The minimal required > change would be the attached. If you want refacotor the > UnicodeData reader or rearrange import sutff, it should be > separate patches. > > It would be better use IOBase for Python3 especially for stdout > replacement but I didin't since it *is* working. > > > Everything else looks correct. I apologise for not having replied to your > > question in the original bug report. I had intended to, but as I said, > > there's been an increase in the things I need to juggle at the moment. regards. -- Kyotaro Horiguchi NTT Open Source Software Center