: At the moment I need something quite simple. To identify a page that : appears in many forms, e.g.: : : - Normal version : - Split across several pages : - Print version : - From a different section (different styling and navigation elements) : : Basically identical content, presented in different ways.
Actually, your "Split across several pages" comment implies that you want a system which can tell that page 1 of a multipage article should be grouped with page 2 -- which may be radically different content. Most multipage documents have very differnet text on subsequent pages, so i'm not sure that a progromatic solution is going to be bale to spot that. I may also be reading too much into your message, but it sounds like you aren't trying to index generic content -- it sounds like you are trying to index content under your control (ie: content on your own web site). if that's the case, then presumably you know somethign about the source data and the URL strucutre -- maybe you could solve this problem when you build your index. for example, if i look at a site like perl.com, i can see a pattern in the way the article URLs look... page 1... http://www.perl.com/pub/a/2005/02/17/3d_engine.html page 2, etc... http://www.perl.com/pub/a/2005/02/17/3d_engine.html?page=2 printable... http://www.perl.com/lpt/a/2005/02/17/3d_engine.html So instead of putting all of those URLs in the index as seperate docs, why not createa single doc, with all of those URLs? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]