Hi Everyone I have a use case where I need to provide suggestions (autocomplete) on multivalued tags. For example if i have documents as below
docId| users | tags (multivalued/array field) ----------------------------------------------------------- doc1 [user1, user2,user3] one, two, thirty doc2 [user1, user3, user5] two, twenty five, four doc3 [user2, user4] thirty, forty nine, twenty Query: -------- { prefix : "tw" users: ["user1"] } Expected Output: (filter + no duplicates) ---------------------- "two", "twenty five" I am curious to know if i can use the FST based suggester (FSTCompletionLookup). I would like to filter by a few fields (if not many). I looked into elasticsearch and context suggester ( https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#context-suggester) seem to be a good fit. As the FST is maintained in a heap, I am wondering will this scale well and will it create GC issues or some other scaling issues for me in future. Here are some things i would like to understand 1. How does adding a context affect the performance and memory footprint ? Does it create one FST for each unique combination of context ? 2. What is the recommendation for the number of shards (if i decide to put this in a separate index)? should i keep the number of shards minimum ? 3. Does it scale well horizontally ? As the index size grows can i add more machines and expect the system to scale well ? Any explanation of the internal implementation detail would also help me understand it better. I read http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html to get the overall idea. I would appreciate some practical advice on if this is a good approach. I am also curious to hear if there are alternatives to elasticsearch that people have used to provide suggestions on multi valued fields. Thanks Srini