Thank you for the explanation and the tool. On Tue, May 23, 2017 at 4:07 PM Uwe Schindler <u...@thetaphi.de> wrote:
> Hi, > > FileReader is a broken class, this is well-known. For that reason it is > part of the forbidden-apis lis, which is also used by Lucene to prevent > issues like your in our source code. To correctly specify the characterset > for reading a file, you have to use an FileInputStream and wrap it with an > InputStreamReader. On the InputStreamReader you can give the charset. > > See https://github.com/policeman-tools/forbidden-apis > > Uwe > > ----- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Kudrettin Güleryüz [mailto:kudret...@gmail.com] > > Sent: Tuesday, May 23, 2017 9:13 PM > > To: java-user@lucene.apache.org > > Subject: Re: utf-8 issues depending on host > > > > I create the object as new FileReader(file) > > Where file is read from File.listFiles() as below: > > cwd.listFiles(getSourceCodeFilter()) > > File file : files > > > > FileReader doesn't seem to have a constructor that lets me specify an > > encoding, and in fact I feel like I should not be setting it to UTF-8 by > > default, anyways. > > > > Let me revise my question, how can I make sure all hosts running this > > indexer code behave as expected? It certainly runs as expected on one > > machine while not on others. One that runs as expected is Debian 8.3 > others > > are Debian 7.4. > > > > Thank you > > > > On Tue, May 23, 2017 at 10:45 AM Adrien Grand <jpou...@gmail.com> > > wrote: > > > > > The issue is likely due to how you create the FileReader that you pass > to > > > TextField. Maybe you don't give it the right encoding? > > > > > > Le mar. 23 mai 2017 à 16:38, Kudrettin Güleryüz <kudret...@gmail.com> > a > > > écrit : > > > > > > > Hi, > > > > > > > > Depending on the host running indexer, UTF-8 characters are not > stored > > > (not > > > > correctly, anyways) in Lucene index. > > > > > > > > Interestingly, locale output is identical on all hosts but the > output is > > > > different. > > > > > > > > Apparently using FileReader could be the culprit. I am currently > using > > > > TextField(String name, Reader reader) > > > > > > > > How can I improve this? What is the suggested way for handling this > using > > > > 5.2.1? TextField(String name, String value, Store store)? > > > > > > > > Thank you > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >