suggester for multivalued (tags) field

Srinivasan Ramaswamy Mon, 12 Oct 2020 00:51:51 -0700

Hi Everyone

I have a use case where I need to provide suggestions (autocomplete) on
multivalued tags. For example if i have documents  as below

docId| users | tags (multivalued/array field)
-----------------------------------------------------------
doc1 [user1, user2,user3] one, two, thirty
doc2 [user1, user3, user5] two, twenty five, four
doc3 [user2, user4] thirty, forty nine, twenty

Query:
--------
{
prefix : "tw"
users: ["user1"]
}

Expected Output: (filter + no duplicates)
----------------------
"two",
"twenty five"

I am curious to know if i can use the FST based suggester
(FSTCompletionLookup). I would like to filter by a few fields (if not
many). I looked into elasticsearch and context suggester (
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#context-suggester)
seem to be a good fit.

As the FST is maintained in a heap, I am wondering will this scale well and
will it create GC issues or some other scaling issues for me in future.
Here are some things i would like to understand

1. How does adding a context affect the performance and memory footprint ?
Does it create one FST for each unique combination of context ?
2. What is the recommendation for the number of shards (if i decide to put
this in a separate index)? should i keep the number of shards minimum ?
3. Does it scale well horizontally ? As the index size grows can i add more
machines and expect the system to scale well ?

Any explanation of the internal implementation detail would also help me
understand it better. I read
http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html
to get the overall idea. I would appreciate some practical advice on if
this is a good approach.

I am also curious to hear if there are alternatives to elasticsearch that
people have used to provide suggestions on multi valued fields.

Thanks
Srini

suggester for multivalued (tags) field

Reply via email to