Re: Improving ‘guix search’ scoring

zimoun Thu, 18 Jul 2019 04:12:00 -0700

Hi,

On Wed, 17 Jul 2019 at 23:27, Ludovic Courtès <l...@gnu.org> wrote:


> I guess computing the TF-IDF could perhaps improve the results compared
> to the current scoring mechanism.  It would be worth trying to implement
> it.
>
> The bottom line though, as you wrote, is that this all depends on the
> quality of synopses and descriptions, and there’s only so much we can
> draw from 5-line descriptions.

>From my opinion, because the description is say 5 lines plus the
synopsis, before implementing something, one needs to first analyse
the "quality" of the available information (words + dependencies). I
mean doing some "data science" (buzz buzz! :-)) with R or Python.
And I do not know the state-of-art of recommender systems. Neither
applied to packages retrieval. I have never read something about that
in other distributions (Debian, Gentoo, etc.). Someone does? Any
pointer?

For example, the current scoring looks like a poor man version of the
Boolean model of Information Retrieval [1]. What about the Okapi model
[2]? etc.

Well, if a student is reading this thread and is looking for a project. ;-)

And I will try to give a look after my summer holidays.
Please share your opinion or experience.


All the best,
simon


[1] https://en.wikipedia.org/wiki/Boolean_model_of_information_retrieval
[2] https://en.wikipedia.org/wiki/Okapi_BM25

Re: Improving ‘guix search’ scoring

Reply via email to