I'm fiddling with custom anaylyzers to analyze email addresses to store the full
email address and the component parts. It's based on Solr's analyzer framework,
so I have a StandardTokenizerFactory followed by a EmailFilterFactory. It produces
Analyzing "<[EMAIL PROTECTED]>"
1: [EMAIL PROTECTED]:1->31:<EMAIL>]
2: [humphrey:1->9:<EMAIL>]
3: [bogart:10->16:<EMAIL>]
4: [casablanca:17->27:<EMAIL>]
5: [com:28->31:<EMAIL>]
I set the start/end offset to be the length of the component, but in the LIA
book listing 4.6 shows the start/end offsets for the synonyms as the same as the
original token, whereas I set my start/end as the correct start/end for the
length and offset of the part.
LIA says these are not used in Lucene - is that still the case for 2.1 and does
this matter?
Thanks
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]