On Thu, Nov 29, 2007 at 10:20:11AM -0500, Mike Frysinger wrote:
> On Tuesday 13 November 2007, Robin H. Johnson wrote:
> > If you had bookmarks to the old style of URL, please consult the FAQ for
> > the new form. We are NOT rewriting these URLs:
> > '/packages/?category=media-sound;name=mp3unicode'
> > (The new form is '/package/media-sound/mp3unicode').
> why ?  you've just broken every site out there that links to us in the common 
> form you've quoted here.  there's no reason you cant add three lines of code 
> to check if the "category" GET variable exists and if so, redirect 
> accordingly.
Because:
- Using the ';' as an argument separator in the old side is not a valid
  query argument separator, and there are URLs out there that have added
  further arguments using it, complicating parsing.
- See also RFC1738: 'Within the <path> and <searchpart> components, "/",
  ";", "?" are reserved.'
- The old site allowed a LOT of varations, all leading to the same
  content, but some of which broke badly.
  /?category=foo&name=bar
  /?category=foo;name=bar
  /?name=bar&category=foo
  /?name=bar;category=foo;this=wasbroken
  /packages/?(one of the above query strings)
  (several more prefixes, all of which gave you the same page)
- Having a single valid URL for a given resource greatly improves cache
  hit rates (and we do use caching heavily on the new site, 60% hit rate
  at the moment, see further down as well).
- The old parsing and variable usage code was the source of multiple
  bugs as well as the security issue that shuttered the site.
- I _want_ old sites to change to using the new form, which I do
  advertise as being permanent resource URLs (as well as being much
  easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
  base URL, and you are done).

That said, if somebody wants to point me to something decent so that
Squid can rewrite the URLs WITH the query parameters (the built-in squid
stuff seems to ignore them) and hit the cache, and that can add a big
warning at the top of the page, I'd be happy to use it for a transition
period, just like the RSS URLs (which are redirected until January 2008,
but only because they are automated, and not browsed by humans).

On the subject of Squid, it would be extremely useful if it could ignore
some headers and respect others in figuring out if the page is already
in the cache, without stripping the headers from the request (it is
doable with Apache's mod_cache), so that two requests with only a
slightly different User-Agent between them hit the same cache entry,
while different Accept* headers are respected, adn don't hit the same
cache entry?

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : [EMAIL PROTECTED]
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

Attachment: pgpn70TfQfxir.pgp
Description: PGP signature

Reply via email to