Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Ryusuke Kenji
Hi Prof, Thank you for your reply. Sorry that I missed out the below information. >Sys.getlocale() [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" I have just noticed that tra

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Duncan Temple Lang
Hi Ryusuke I would use the encoding parameter of htmlParse() and download and parse the content in one operation: htmlParse("http://home.sina.com";, encoding = "UTF-8") If you want to use getURL() in RCurl, use the .encoding parameter You didn't tell us the output of Sys.getlocale()

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-02 Thread Ryusuke Kenji
Hi All, First method:- >library(XML) >theurl <- "http://home.sina.com"; >download.file(theurl, "tmp.html") >txt <- readLines("tmp.html") >txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) >g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) >head(grep(" ", g, value

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-01-01 Thread Lauri Nikkinen
Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my Win XP. See sessionInfo from below. Mac R: > sessionInfo() R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8 attached base packages: [1] stats graphics grDevi

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Eduardo Leoni
In the meantime, try this. library(XML) theurl <- "http://www.aarresaari.net/jobboard/jobs.html"; download.file(theurl, "tmp.html") txt <- readLines("tmp.html") txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g <- xpathSApply(txt, "//p", function(x) xmlValue(x)) head(grep

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Lauri Nikkinen
Thanks, looking forward to that! Happy New Year! -Lauri 2009/12/31 Duncan Temple Lang : > Hi Lauri. > > I am in the process of making some changes > to the encoding in the XML package. I'll take a look > over the next few days. (Not certain precisely when.) > >  D. > > > > Lauri Nikkinen wrote: >

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Duncan Temple Lang
Hi Lauri. I am in the process of making some changes to the encoding in the XML package. I'll take a look over the next few days. (Not certain precisely when.) D. Lauri Nikkinen wrote: > Hi, > > I'm trying to get data from web page and modify it in R. I have a > problem with encoding. I'm no

[R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Lauri Nikkinen
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below > library(RCurl) > library(XML) > > site <- getURL("http://www.aarresaari.net/jobboard/jobs.html";) > txt <- readLines(tc <- textConn