Hi Simon

 I tried this on OS X, Linux and Windows and it works without any problem.
So there must be some strange interaction with your configuration.
So below are some things to try in order to get more information about the 
problem.

It would be more informative to give us the explicit version information
about the packages, e.g. use sessionInfo().  Details are very important
in cases like this.

In addition the versions of the packages, it is also important to identify the
version of libxml via the  libxmlVersion() function.
(Mine is 2.07.03. Yours may still be in the 2.6.16 region. I can't recall the 
defaults on OS X 10.6.)

Are you doing this in a GUI or at the command-line? If the former, try the
latter, i.e. run the commands in a terminal and see if that changes anything,
e.g. if any characters are causing problems.

Since you are seeing some of the HTML document appear on the console, the 
problem is
in the implicit call to print when after the call to htmlTreeParse().
The problem is likely to be delayed if you assign the result of htmlTreeParse()
to a variable and do not induce this call to print().
Then you can explore the tree and see if it is corrupted in some way.

Furthermore, you might use htmlParse(). It returns the tree in a very different
form, but which can be manipulated with the same R functions, and also XPath 
queries.
I "very rarely" (i.e. never) use htmlTreeParse() anymore.

 D.



On 8/25/11 8:41 AM, Simon Kiss wrote:
> Dear colleagues,
> I'm trying to parse the html content from this webpage:
> http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011
> 
> Using the following code
> library(RCurl)
> library(XML)
> myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011";)
> 
> .x<-getURL(myurl)
> htmlTreeParse(.x, asText=T)
> 
> This prints approximately 15 lines of the output from the html document and 
> then mysteriously stops. The command line prompt does not reappear and force 
> quit is the only option. 
> I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL 
> are installed.
> Yours, Simon Kiss
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to