Le jeudi 21 février 2013 à 13:16 +0400, Lawr Eskin a écrit : > Hello dear R-help mailing list. > > > Looks like the same issue in Russian: > > > > library(RCurl) > > library(XML) > > u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1" > > a = getURL(u) > > a # Here - the Russian is fine. > > a2 <- htmlParse(a) > > a2 # Here it is a mess... > > > > None of these seem to fix it: > > > > htmlParse(a, encoding = "windows-1251") > > htmlParse(a, encoding = "CP1251") > > htmlParse(a, encoding = "cp1251") > > htmlParse(a, encoding = "iso8859-5") > > > > This is my locale: > > > > Sys.getlocale() > > "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251" > > > > Any suggestions? What does Encoding(a) say?
(FWIW, here on Linux even a is not in the correct encoding : <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><head> <title>ГЉГіГЇГЁГІГј îäГîêîìГГ ГІГГіГѕ êâà ðòèðó Гў Ìîà ±ГЄГўГҐ В— 11430 îáúÿâëåГГЁГ© Г® ïðîäà æå îäГîêîìà à òГûõ êâà ðòèð</title> [...]) Regards > Thanks you very much in advance, > > Lavrentiy Eskin > <http://www.eng.nvg.ru> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.