Hi, On Wed, 17 Jul 2019 at 23:27, Ludovic Courtès <l...@gnu.org> wrote:
> I guess computing the TF-IDF could perhaps improve the results compared > to the current scoring mechanism. It would be worth trying to implement > it. > > The bottom line though, as you wrote, is that this all depends on the > quality of synopses and descriptions, and there’s only so much we can > draw from 5-line descriptions. >From my opinion, because the description is say 5 lines plus the synopsis, before implementing something, one needs to first analyse the "quality" of the available information (words + dependencies). I mean doing some "data science" (buzz buzz! :-)) with R or Python. And I do not know the state-of-art of recommender systems. Neither applied to packages retrieval. I have never read something about that in other distributions (Debian, Gentoo, etc.). Someone does? Any pointer? For example, the current scoring looks like a poor man version of the Boolean model of Information Retrieval [1]. What about the Okapi model [2]? etc. Well, if a student is reading this thread and is looking for a project. ;-) And I will try to give a look after my summer holidays. Please share your opinion or experience. All the best, simon [1] https://en.wikipedia.org/wiki/Boolean_model_of_information_retrieval [2] https://en.wikipedia.org/wiki/Okapi_BM25