On Tue, 22 Nov 2005 18:51:44 +0100 Sven Vermeulen <[EMAIL PROTECTED]> wrote: >> A good start could be to do that the quick and ugly way, thanks >> to Google (with some "site:www.gentoo.org/some/thing/" and other >> black magic in the query terms). > [...]
> - Google bases its search functionality on cached pages. Bah, yes, theory is that it's not 100% perfect, but in practice i find it satisfying. > - We would depend on Google a bit Yes, but if other engines offer similar functionalities, in which case it would just be a matter of changing the forms params names and posting it elsewhere. But i don't know much about other public search engines, so i have no idea about what kind of queries they allow. > Now Google might be a reliable web site/service, I'd rather have > the search functionality of our web site implemented on the > Gentoo infrastructure. Sure, if that's doable in terms of workload and time to implement, then it could be the best method. My only concern would be on the choice of that engine: i mean, i would still prefer Google over an internal engine which doesn't allow mixing of exact strings and keywords in queries, or which drops non-alpha chars, etc. I'm suffering enough with the forum's one already :) > - Restricting pages to /doc (documentation), /main (Gentoo > information), /news (News items+GWN), /proj (project stuff) Not a problem with google, that's the "/some/thing/" part of the above cited fake query. I've put some real examples in the proof-of-concept form i've posted about in an earlier message somewhere else in that thread: http://tdegreni.free.fr/gentoosearch/ > - Restricting languages (en, fr, ... and any combination) Same as above for searching in a single language, adding some "/fr/" to the base URL (or also possible using the lr=lang_fr parameter, although it's less reliable). But for arbitrary combinations, yes, that's probably a limitation (or a really ugly query...). What i've thought for i18n of the above JS code was to: - always at least propose search on the english pages - if user has defined in his browser a non-english preferred language, also add some localised choices to the dropdown list. (I'm not sure how to detect the user preferred lang from Javascript though). > - Have the search points assigned so that hits are calculated > with certain weights: > * title's get most of the points, unless many titles are > selected > * abstract's get the second most points, yada yada > * content get third most points Here again, i think google is good enough for the needs, especially if you target the search on some "/doc/en/" or alike sub-parts of the website, which don't let that many pages anyway. I mean, i often do that kind of searchs on the docs or the dev handbook with a conquery plugin, and i don't remember having ever seen the page i was looking for not beeing in the top 5 results. But yes, at least in theory, a tweaked local engine could be even better. Hmm... re-reading the above message, i realize i may sound like some kind of google-zealot: so just to make it clear, i'm not, and i would be pleased to see anything better implemented. It's really just that i think it could do a rather good job and that using it is easy enough to be a really short-term solution. -- TGL. -- gentoo-dev@gentoo.org mailing list