> Eg, you'd index only "boston", "red", "sox", "rumor" into the FST, and > then have a separate search index with "boston red sox rumor" indexed > as a document. If the user types "red so", then you run suggest on > "red" and on "so", and then run a hmm MultiPhraseQuery for > (red|redmond|reddit) (so|sox|sophomore|...) against the index? How to
I know of at least one company (ehm, can't tell by name) that does this for matching physical locations against user queries (n yo => new york, etc.). Granted, this is a very closed domain and the boosts can be pretty well approximated (cities by the number of citizens, streets by the location they're at etc.). Good idea to try out though. Another possible alternative would be to run a frequent phrase extraction algorithm of some sort, then collect only the best candidate phrases. I bet a lot of these these could be fit into an FST, perhaps even indexed at every starting token's position so that infix searches could work. If you need absolutely all suggestions you'll need to come up with something more clever. Dawid --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org