On Thu, Nov 29, 2007 at 10:20:11AM -0500, Mike Frysinger wrote: > On Tuesday 13 November 2007, Robin H. Johnson wrote: > > If you had bookmarks to the old style of URL, please consult the FAQ for > > the new form. We are NOT rewriting these URLs: > > '/packages/?category=media-sound;name=mp3unicode' > > (The new form is '/package/media-sound/mp3unicode'). > why ? you've just broken every site out there that links to us in the common > form you've quoted here. there's no reason you cant add three lines of code > to check if the "category" GET variable exists and if so, redirect > accordingly. Because: - Using the ';' as an argument separator in the old side is not a valid query argument separator, and there are URLs out there that have added further arguments using it, complicating parsing. - See also RFC1738: 'Within the <path> and <searchpart> components, "/", ";", "?" are reserved.' - The old site allowed a LOT of varations, all leading to the same content, but some of which broke badly. /?category=foo&name=bar /?category=foo;name=bar /?name=bar&category=foo /?name=bar;category=foo;this=wasbroken /packages/?(one of the above query strings) (several more prefixes, all of which gave you the same page) - Having a single valid URL for a given resource greatly improves cache hit rates (and we do use caching heavily on the new site, 60% hit rate at the moment, see further down as well). - The old parsing and variable usage code was the source of multiple bugs as well as the security issue that shuttered the site. - I _want_ old sites to change to using the new form, which I do advertise as being permanent resource URLs (as well as being much easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the base URL, and you are done).
That said, if somebody wants to point me to something decent so that Squid can rewrite the URLs WITH the query parameters (the built-in squid stuff seems to ignore them) and hit the cache, and that can add a big warning at the top of the page, I'd be happy to use it for a transition period, just like the RSS URLs (which are redirected until January 2008, but only because they are automated, and not browsed by humans). On the subject of Squid, it would be extremely useful if it could ignore some headers and respect others in figuring out if the page is already in the cache, without stripping the headers from the request (it is doable with Apache's mod_cache), so that two requests with only a slightly different User-Agent between them hit the same cache entry, while different Accept* headers are respected, adn don't hit the same cache entry? -- Robin Hugh Johnson Gentoo Linux Developer & Infra Guy E-Mail : [EMAIL PROTECTED] GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
pgpn70TfQfxir.pgp
Description: PGP signature