Thanks Pierrick. Are you say that I should construct Token in analyzer like new Token ("chem_H2O", 100, 103, "chem");
note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O? I also have some further problem and not sure if can be solved by this approch. I want to index H2O in a compound, say H2O-CH2. say I want a query to find out H2O in a compound. How can I do that? Thanks, Ethan -------------- Original message -------------- > [EMAIL PROTECTED] a écrit : > > > I am working on a program to index/search chemical element/compound. Say I > write an analyzer to filter out chemical terms, such as H2O. I noticed that I > can specify a tocken's type. Can I construct a token as > > new Token ("H2", start, end, "chem"); > > > > My questions is > > How do I search all the tokens with "chem" type token, such as H2O, O2, > > etc? > Any sample like this? > > > > If this approach doesn't work, what's the best approach? > > You may assign a type to the tokens, and then you may filter them > according to their type *but* the index forgets this info since it > stores *terms* (field/value pairs). > > Compare : > http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html > and > http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html > > Notice however that the terms have also their relative position (the > Token's positionIncrement, default = 1) stored in the index ; this > allows proximity searches. > > So... how to do ? > > 1) use a dedicated field "chem" where only chemical content is allowed > (filter out every token whose type is different from "chem") > 2) manipulate your termText : "chem_H2" ; the same for your queries > 3) play with the query rather than with the index content : filter out > what is not chemical > > There may be other solutions... > > Cheers, > > p.b. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >