Re: utf-8 issues depending on host

2017-05-23 Thread Kudrettin Güleryüz
Thank you for the explanation and the tool. On Tue, May 23, 2017 at 4:07 PM Uwe Schindler wrote: > Hi, > > FileReader is a broken class, this is well-known. For that reason it is > part of the forbidden-apis lis, which is also used by Lucene to prevent > issues like your in our source code. To c

RE: utf-8 issues depending on host

2017-05-23 Thread Uwe Schindler
Hi, FileReader is a broken class, this is well-known. For that reason it is part of the forbidden-apis lis, which is also used by Lucene to prevent issues like your in our source code. To correctly specify the characterset for reading a file, you have to use an FileInputStream and wrap it with

Re: utf-8 issues depending on host

2017-05-23 Thread Kudrettin Güleryüz
I create the object as new FileReader(file) Where file is read from File.listFiles() as below: cwd.listFiles(getSourceCodeFilter()) File file : files FileReader doesn't seem to have a constructor that lets me specify an encoding, and in fact I feel like I should not be setting it to UTF-8 by defau

Re: utf-8 issues depending on host

2017-05-23 Thread Adrien Grand
The issue is likely due to how you create the FileReader that you pass to TextField. Maybe you don't give it the right encoding? Le mar. 23 mai 2017 à 16:38, Kudrettin Güleryüz a écrit : > Hi, > > Depending on the host running indexer, UTF-8 characters are not stored (not > correctly, anyways) i

Re: MultiTermQuery vs multiple TermQuery'ies - is there a performance gain?

2017-05-23 Thread Adrien Grand
Rather than MultiTermQuery, you could consider using TermInSetQuery. Depending on the number of terms, it will use sensible defaults. The main drawback is that it returns constant scores, but if this does not work for you then MultiTermQuery would not work either. Le mar. 23 mai 2017 à 13:52, Mich

utf-8 issues depending on host

2017-05-23 Thread Kudrettin Güleryüz
Hi, Depending on the host running indexer, UTF-8 characters are not stored (not correctly, anyways) in Lucene index. Interestingly, locale output is identical on all hosts but the output is different. Apparently using FileReader could be the culprit. I am currently using TextField(String name,

MultiTermQuery vs multiple TermQuery'ies - is there a performance gain?

2017-05-23 Thread Michael Wilkowski
Hi, I am building an app that will create multiple term queries join with OR (>100 primitive TermQuery'ies). Is there a real performance gain implementing custom MultiTermQuery instead of simply joining multiple TermQuery with OR? Regards, MW