Re: how to fully preprocess query before fuzzy search?

2012-09-17 Thread Jack Krupansky
Either is fine. In fact just escape based on the individual character, not the context. The multi-character context is telling you places where escape is not essential, but that doesn't mean it would hurt. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Monday, Septembe

Re: Hibernate Search with Regex based on Table

2012-09-17 Thread Sanne Grinovero
Right, you should use the MappingCharFilter from Solr; Hibernate Search can use the Solr tokenizers and filters: http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#d0e462 To answer your other questions: > In short: Would it be possible to introduce Hibernate Search in the > p

RE: how to fully preprocess query before fuzzy search?

2012-09-17 Thread Ilya Zavorin
Thanks so I do not need to escape the "&" in "dog & cat" But I do need to escape the "&&" in "dog && cat" correct? And do I escape as "dog \&& cat" or as "dog \&\& cat"? Ilya -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Monday, Se

Re: how to fully preprocess query before fuzzy search?

2012-09-17 Thread Jack Krupansky
" Lucene supports escaping special characters that are part of the query syntax. The current list special characters are + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / " See: http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html So, maybe yo

how to fully preprocess query before fuzzy search?

2012-09-17 Thread Ilya Zavorin
I am processing a bunch of text coming out of OCR, i.e. it's machine-generated text that contains some errors like garbage characters attached to words, letters replaced with similarly looking characters (e.g. "I" with "1") etc. The text is whitespace-tokenized and I am trying to match each toke