Hi, Rinat! On Nov 02, Rinat Ibragimov wrote: > > > I can't decide. From my point of view, the current approach is > > > fine. Please pick a variant, and I'll try to implement that. > > > > No, I cannot guess which approach will produce more relevant > > searches. Implement something and then we test what works better > > Variable-length n-grams approach is too innovative, and hard to reason > about. I've never heard about such an approach, and it doesn't look > good to me. So I'll stick with a simple slicer.
If you mean that variant where it splits "n-grams approach" to "n-gr", "gra", "ram", "ams", "ms a", "s ap", "app", ... then it's just "n letters in every chunk" very easy to explain. But ok, let's start simple and benchmark. > > Of course, it can. Note that fts_get_word() doesn't generate n-grams > > either, it gets the whole word and the n-gram plugin later splits it > > into n-grams. Similarly param->mysql_parse() will extract words for > > you and you'll split them into n-grams. > > Changed to use param->mysql_parse(). > > Turns out that in Aria, MyISAM, and InnoDB, param->mysql_parser() does > call back param->mysql_add_word(). Is it part of the plugin API? Yes, it is. E.g. slide 12 from my old presentation: http://conferences.oreillynet.com/presentations/mysql06/golubchik_sergei.pdf shows that there are three points where a plugin can add functionality. * It can extract the text and then call param->mysql_parser(), this allows to parse, say, gzip-ed texts or EXIF comments in images. * It can replace param->mysql_parser(), to use different rules for spliting the text into words. This is what the n-gram plugin normally does * It can replace param->mysql_add_word() to post-process every word after the built-in parser did the splitting. For example, stemming or soundex plugin can do that. > Comments in include/mysql/plugin_ftparser.h do not mention that at > all. That's why I initially thought that param->mysql_parse() will > parse the string like the default parser do, without any ways to > interact with the process. I've edited the comment to mention this possibility. Regards, Sergei VP of MariaDB Server Engineering and secur...@mariadb.org _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp