On Tue, 22 Nov 2005 18:51:44 +0100
Sven Vermeulen <[EMAIL PROTECTED]> wrote:
>> A good start could be to do that the quick and ugly way, thanks
>> to Google (with some "site:www.gentoo.org/some/thing/" and other
>> black magic in the query terms).
> [...]

> - Google bases its search functionality on cached pages. 

Bah, yes, theory is that it's not 100% perfect, but in practice i
find it satisfying.

> - We would depend on Google a bit

Yes, but if other engines offer similar functionalities, in which
case it would just be a matter of changing the forms params names
and posting it elsewhere. But i don't know much about other public
search engines, so i have no idea about what kind of queries they
allow.

> Now Google might be a reliable web site/service, I'd rather have
> the search functionality of our web site implemented on the
> Gentoo infrastructure.

Sure, if that's doable in terms of workload and time to implement,
then it could be the best method.

My only concern would be on the choice of that engine: i mean,
i would still prefer Google over an internal engine which doesn't
allow mixing of exact strings and keywords in queries, or which
drops non-alpha chars, etc. I'm suffering enough with the forum's
one already :)

> - Restricting pages to /doc (documentation), /main (Gentoo
> information), /news (News items+GWN), /proj (project stuff)

Not a problem with google, that's the "/some/thing/" part of
the above cited fake query. I've put some real examples in the
proof-of-concept form i've posted about in an earlier message
somewhere else in that thread:
http://tdegreni.free.fr/gentoosearch/

> - Restricting languages (en, fr, ... and any combination)

Same as above for searching in a single language, adding some
"/fr/" to the base URL (or also possible using the lr=lang_fr
parameter, although it's less reliable). But for arbitrary
combinations, yes, that's probably a limitation (or a really ugly
query...).

What i've thought for i18n of the above JS code was to:
 - always at least propose search on the english pages
 - if user has defined in his browser a non-english preferred
language, also add some localised choices to the dropdown list.
(I'm not sure how to detect the user preferred lang from Javascript
though).

> - Have the search points assigned so that hits are calculated
> with certain weights:
>     * title's get most of the points, unless many titles are
> selected
>     * abstract's get the second most points, yada yada
>     * content get third most points

Here again, i think google is good enough for the needs, especially
if you target the search on some "/doc/en/" or alike sub-parts of
the website, which don't let that many pages anyway. I mean, i
often do that kind of searchs on the docs or the dev handbook with
a conquery plugin, and i don't remember having ever seen the page i
was looking for not beeing in the top 5 results. But yes, at least
in theory, a tweaked local engine could be even better.


Hmm... re-reading the above message, i realize i may sound like
some kind of google-zealot: so just to make it clear, i'm not, and i
would be pleased to see anything better implemented. It's really
just that i think it could do a rather good job and that using it is
easy enough to be a really short-term solution.

--
TGL.
-- 
gentoo-dev@gentoo.org mailing list

Reply via email to