Thanks Pierrick.
Are you say that I should construct Token in analyzer like
new Token ("chem_H2O", 100, 103, "chem");
note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather
than chem_H2O?
I also have some further problem and not sure if can be solved by this approch.
I want to index H2O in a compound, say H2O-CH2. say I want a query to find out
H2O in a compound. How can I do that?
Thanks,
Ethan
-- Original message --
> [EMAIL PROTECTED] a écrit :
>
> > I am working on a program to index/search chemical element/compound. Say I
> write an analyzer to filter out chemical terms, such as H2O. I noticed that I
> can specify a tocken's type. Can I construct a token as
> > new Token ("H2", start, end, "chem");
> >
> > My questions is
> > How do I search all the tokens with "chem" type token, such as H2O, O2,
> > etc?
> Any sample like this?
> >
> > If this approach doesn't work, what's the best approach?
>
> You may assign a type to the tokens, and then you may filter them
> according to their type *but* the index forgets this info since it
> stores *terms* (field/value pairs).
>
> Compare :
> http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html
> and
> http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html
>
> Notice however that the terms have also their relative position (the
> Token's positionIncrement, default = 1) stored in the index ; this
> allows proximity searches.
>
> So... how to do ?
>
> 1) use a dedicated field "chem" where only chemical content is allowed
> (filter out every token whose type is different from "chem")
> 2) manipulate your termText : "chem_H2" ; the same for your queries
> 3) play with the query rather than with the index content : filter out
> what is not chemical
>
> There may be other solutions...
>
> Cheers,
>
> p.b.
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>