I am using org.apache.lucene.queryparser.classic.QueryParser in lucene 6.0.0 to parse queries using a CustomAnalyzer as shown below:
public static void testFilmAnalyzer() throws IOException, ParseException { CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder() .addCharFilter("patternreplace", "pattern", "(movie|film|picture).*", "replacement", "") .withTokenizer("standard") .build(); QueryParser qp = new QueryParser("name", nameAnalyzer); qp.setDefaultOperator(QueryParser.Operator.AND); String[] strs = {"avatar film fiction", "avatar-film fiction", "avatar-film-fiction"}; for (String str : strs) { System.out.println("Analyzing \"" + str + "\":"); showTokens(str, nameAnalyzer); Query q = qp.parse(str); System.out.println("Parsed query of \"" + str + "\":"); System.out.println(q + "\n"); }} private static void showTokens(String text, Analyzer analyzer) throws IOException { StringReader reader = new StringReader(text); TokenStream stream = analyzer.tokenStream("name", reader); CharTermAttribute term = stream.addAttribute(CharTermAttribute.class); stream.reset(); while (stream.incrementToken()) { System.out.print("[" + term.toString() + "]"); } stream.close(); System.out.println();} I get the following output, when I invoke testFilmAnalyzer(): Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film fiction":+name:avatar +name:fiction Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film fiction":+name:avatar +name:fiction Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction": name:avatar It seems like the analyzer uses the PatternReplaceCharFilter in its correct intended order (i.e. before tokenization), while the QueryParser does so afterwards. Does anyone have an explanation for that? Isn't that a bug?