Hello,

--- Chakra Yadavalli <[EMAIL PROTECTED]> wrote:
> Hello, I am not sure if this is the right question for this list but
> it is in regards to search engines.

This is not really the right place to ask these types of questions,
but...  robots at mccmedia.com may be a better place to ask, or one of
those forums where people who track things phenomena such as Google
Dances exchange information.

> Suppose you have a website that hosts some protected content that is
> accessible via registered users. How you make the content searchable
> by Google and other popular websearch engines? The idea is not to
> reveal the conent even via the "Google cache."
> 
> Here is what I am thinking... 
> Using Lucene (or its derivatives), skim thru the "protected content"
> and remove all the common stop words , stem the words and place the
> resulting text files in a directory availabe for the search bots (via
> robotstxt rules). That way, even if the content is cached by the
> search engines, it does not make much sense to humans but it still
> will enable them to search it. When they click on the link to the
> skimmed files, we need to redirect them to the login/registe page and
> upon successful login, they should  be redirected to the actual human
> readable/understandable page that corresponds to that has the
> "skimmed
> content." Note that the "protected content" may be living in a
> Content
> Management System or a database.
> 
> Am I overthinking/engineering it? Any ideas are really appreciated.

What you described is doable.  You will have to detect Googlebot user
agent, and feed it indexable text, while redirecting real users to the
protected area.

Using Lucene is an overkill, though.  You can easily remove stop words
with a simple Perl script, for example.  Another thing you could do is
just shuffle the words around.  The current generation of search
engines typically doesn't care (make use of) the word order, while a
human will be lost if you shuffle the words.

Otis


> Thanks in advance,
> Chakra
> -- 
> Visit my weblog: http://www.jroller.com/page/cyblogue
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to