Hi there, For some hyphenated terms, I want them to stay as is instead of being tokenized. For example: e-cigarette, e-cig, I-pad. I don't want them to be split into e and cig or I and pad because the single letter e and I produces too many false positive matches.
Is there a way to tell the standard tokenizer to skip tokenizing some terms? Rebecca Tang Applications Developer, UCSF CKM Legacy Tobacco Document Library<legacy.library.ucsf.edu/> E: rebecca.t...@ucsf.edu