Hi, I am wondering how Google and Altavista search engines can find out the language used in a web page. I can see that they can find the pages written in Romanian language, for example, even though the header of the file is same as for english ones. Do they search for some words? If you think this could be the only solution, is there any Perl module thatcan do that?
Thanks. Teddy's Center: http://teddy.fcc.ro/ Mail: [EMAIL PROTECTED] ----- Original Message ----- From: "Kevin Meltzer" <[EMAIL PROTECTED]> To: "Octavian Rasnita" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Sunday, September 08, 2002 8:54 PM Subject: Re: Getting the web page language Does it matter what language, or what charset? <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> You can always look at this and the lang="foo" tags to try to determine what language, or at least what charset, the page is in. Of course, a charset (like iso-8869-1) can cover many languages, but at least you can narrow it down a little if you don't find a lang="foo" tag. Cheers, Kevin On Sun, Sep 08, 2002 at 08:05:18AM +0300, Octavian Rasnita ([EMAIL PROTECTED]) said something similar to: > Hi all, > > I want to create a search engine. Please tell me how can I find out the > languages used in a web page. > I know that HTML 4.01 uses <html lang="en"> for example, but most of the web > pages don't use this tag. > > What should I test to find the language used? > > Thank you. > > Teddy's Center: http://teddy.fcc.ro/ > Mail: [EMAIL PROTECTED] > > > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- [Writing CGI Applications with Perl - http://perlcgi-book.com] You are all the Buddha. -- Buddha (last words) -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]