from:"Rob Hasselbaum"

NGramTokenFilter filters out small tokens?

2011-12-15 Thread Rob Hasselbaum

Hi. I'm trying to configure an analyzer to be somewhat forgiving of spelling mistakes in longer words of a search query. So, for example, if a word in the query matches at least five characters of an indexed word (token), I want that to be a hit. NGramTokenFilter with a minimum gram size of 5 seems

Mixing norms and no norms in the same document

2011-12-05 Thread Rob Hasselbaum

Hi. I'm indexing about 20,000 documents that could potentially have a few thousand fields with the same field name. I've read in the mailing list archives that there is no hard limit to the number of fields in a document, but that storing norms can be a problem because of the RAM overhead. I don't