Hi Prof,
Thank you for your reply. Sorry that I missed out the below information.
>Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
I have just noticed that tra
Hi Ryusuke
I would use the encoding parameter of htmlParse() and
download and parse the content in one operation:
htmlParse("http://home.sina.com";, encoding = "UTF-8")
If you want to use getURL() in RCurl, use the .encoding parameter
You didn't tell us the output of Sys.getlocale()
Hi All,
First method:-
>library(XML)
>theurl <- "http://home.sina.com";
>download.file(theurl, "tmp.html")
>txt <- readLines("tmp.html")
>txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
>g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
>head(grep(" ", g, value
Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my
Win XP. See sessionInfo from below.
Mac R:
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1
locale:
fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8
attached base packages:
[1] stats graphics grDevi
In the meantime, try this.
library(XML)
theurl <- "http://www.aarresaari.net/jobboard/jobs.html";
download.file(theurl, "tmp.html")
txt <- readLines("tmp.html")
txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE)
g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
head(grep
Thanks, looking forward to that!
Happy New Year!
-Lauri
2009/12/31 Duncan Temple Lang :
> Hi Lauri.
>
> I am in the process of making some changes
> to the encoding in the XML package. I'll take a look
> over the next few days. (Not certain precisely when.)
>
> D.
>
>
>
> Lauri Nikkinen wrote:
>
Hi Lauri.
I am in the process of making some changes
to the encoding in the XML package. I'll take a look
over the next few days. (Not certain precisely when.)
D.
Lauri Nikkinen wrote:
> Hi,
>
> I'm trying to get data from web page and modify it in R. I have a
> problem with encoding. I'm no
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
> library(RCurl)
> library(XML)
>
> site <- getURL("http://www.aarresaari.net/jobboard/jobs.html";)
> txt <- readLines(tc <- textConn
8 matches
Mail list logo