Hi

I was asked to integrate with a system which provides synonyms for words
through API. I checked the existing synonym filters in Lucene and Solr and
they all seem to take a synonyms map up front.

E.g. Lucene's SynonymFilter takes a SynonymMap which exposes an FST, so
it's not really programmatic in the sense that I can provide an impl which
will pull the synonyms through the other system's API.

Solr SynonymFilterFactory just loads the synonyms from a file into a
SynonymMap, and then uses Lucene's SynonymFilter, so it doesn't look like I
can extend that one either.

The problem is that the synonyms DB I should integrate with is HUGE and
will probably not fit in RAM (SynonymMap). Nor is it currently possible to
pull all available synonyms from it in one go. The API I have is something
like String[] getSynonyms(String word).

So I have few questions:

1) Did I miss a Filter which does take a programmatic syn-map which I can
provide my own impl to?

2) If not, Would it make sense to modify SynonymMap to offer
getSynonyms(word) API (using BytesRef / CharsRef of course), with an
FSTSynonymMap default impl so that users can provide their own impl, e.g.
not requiring everything to be in RAM?

2.1) Side-effect benefit, I think, is that we won't require everyone to
deal with the FST API that way, though I'll admit I cannot think of may use
cases for not using SynonymFilter as-is ...

3) If the answer to (1) and (2) is NO, I guess my only option is to
implement my own SynonymFilter, copying most of the code from Lucene's ...
right?

Shai

Reply via email to